Detail
Article
Online article
FT
Medvik - BMC
  • Something wrong with this record ?

An experimental comparison of feature selection methods on two-class biomedical datasets

P. Drotár, J. Gazda, Z. Smékal,

. 2015 ; 66 (-) : 1-10. [pub] 20150824

Language English Country United States

Document type Comparative Study, Journal Article, Research Support, Non-U.S. Gov't

E-resources Online Full text

NLK ProQuest Central from 2003-01-01 to 2023-12-31
Medline Complete (EBSCOhost) from 2012-09-01 to 2015-07-31
Nursing & Allied Health Database (ProQuest) from 2003-01-01 to 2023-12-31
Health & Medicine (ProQuest) from 2003-01-01 to 2023-12-31

Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is an essential step for knowledge discovery in many areas of biomedical informatics. The increased popularity of feature selection methods and their frequent utilisation raise challenging new questions about the interpretability and stability of feature selection techniques. In this study, we compared the behaviour of ten state-of-the-art filter methods for feature selection in terms of their stability, similarity, and influence on prediction performance. All of the experiments were conducted on eight two-class datasets from biomedical areas. While entropy-based feature selection appears to be the most stable, the feature selection techniques yielding the highest prediction performance are minimum redundance maximum relevance method and feature selection based on Bhattacharyya distance. In general, univariate feature selection techniques perform similarly to or even better than more complex multivariate feature selection techniques with high-dimensional datasets. However, with more complex and smaller datasets multivariate methods slightly outperform univariate techniques.

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc16028264
003      
CZ-PrNML
005      
20161021111315.0
007      
ta
008      
161005s2015 xxu f 000 0|eng||
009      
AR
024    7_
$a 10.1016/j.compbiomed.2015.08.010 $2 doi
024    7_
$a 10.1016/j.compbiomed.2015.08.010 $2 doi
035    __
$a (PubMed)26327447
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a xxu
100    1_
$a Drotár, P $u Department of Telecommunications, Brno University of Technology, Technická 12, 61200 Brno, Czech Republic. Electronic address: peter.drotar84@gmail.com.
245    13
$a An experimental comparison of feature selection methods on two-class biomedical datasets / $c P. Drotár, J. Gazda, Z. Smékal,
520    9_
$a Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is an essential step for knowledge discovery in many areas of biomedical informatics. The increased popularity of feature selection methods and their frequent utilisation raise challenging new questions about the interpretability and stability of feature selection techniques. In this study, we compared the behaviour of ten state-of-the-art filter methods for feature selection in terms of their stability, similarity, and influence on prediction performance. All of the experiments were conducted on eight two-class datasets from biomedical areas. While entropy-based feature selection appears to be the most stable, the feature selection techniques yielding the highest prediction performance are minimum redundance maximum relevance method and feature selection based on Bhattacharyya distance. In general, univariate feature selection techniques perform similarly to or even better than more complex multivariate feature selection techniques with high-dimensional datasets. However, with more complex and smaller datasets multivariate methods slightly outperform univariate techniques.
650    _2
$a algoritmy $7 D000465
650    _2
$a výpočetní biologie $x metody $7 D019295
650    _2
$a databáze faktografické $7 D016208
650    _2
$a lidé $7 D006801
650    _2
$a statistické modely $7 D015233
650    _2
$a multivariační analýza $7 D015999
650    _2
$a sekvenční analýza hybridizací s uspořádaným souborem oligonukleotidů $x metody $7 D020411
650    _2
$a Parkinsonova nemoc $x diagnóza $7 D010300
650    _2
$a software $7 D012984
655    _2
$a srovnávací studie $7 D003160
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Gazda, J $u Department of Computers and Informatics, Technical University of Kosice, Letna 9, 0401 Kosice, Slovakia.
700    1_
$a Smékal, Z $u Department of Telecommunications, Brno University of Technology, Technická 12, 61200 Brno, Czech Republic.
773    0_
$w MED00001218 $t Computers in biology and medicine $x 1879-0534 $g Roč. 66, č. - (2015), s. 1-10
856    41
$u https://pubmed.ncbi.nlm.nih.gov/26327447 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y a $z 0
990    __
$a 20161005 $b ABA008
991    __
$a 20161021111724 $b ABA008
999    __
$a ok $b bmc $g 1166578 $s 952894
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2015 $b 66 $c - $d 1-10 $e 20150824 $i 1879-0534 $m Computers in biology and medicine $n Comput Biol Med $x MED00001218
LZP    __
$a Pubmed-20161005

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...