-
Something wrong with this record ?
An experimental comparison of feature selection methods on two-class biomedical datasets
P. Drotár, J. Gazda, Z. Smékal,
Language English Country United States
Document type Comparative Study, Journal Article, Research Support, Non-U.S. Gov't
NLK
ProQuest Central
from 2003-01-01 to 2023-12-31
Medline Complete (EBSCOhost)
from 2012-09-01 to 2015-07-31
Nursing & Allied Health Database (ProQuest)
from 2003-01-01 to 2023-12-31
Health & Medicine (ProQuest)
from 2003-01-01 to 2023-12-31
- MeSH
- Algorithms MeSH
- Databases, Factual MeSH
- Humans MeSH
- Multivariate Analysis MeSH
- Parkinson Disease diagnosis MeSH
- Oligonucleotide Array Sequence Analysis methods MeSH
- Software MeSH
- Models, Statistical MeSH
- Computational Biology methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Comparative Study MeSH
Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is an essential step for knowledge discovery in many areas of biomedical informatics. The increased popularity of feature selection methods and their frequent utilisation raise challenging new questions about the interpretability and stability of feature selection techniques. In this study, we compared the behaviour of ten state-of-the-art filter methods for feature selection in terms of their stability, similarity, and influence on prediction performance. All of the experiments were conducted on eight two-class datasets from biomedical areas. While entropy-based feature selection appears to be the most stable, the feature selection techniques yielding the highest prediction performance are minimum redundance maximum relevance method and feature selection based on Bhattacharyya distance. In general, univariate feature selection techniques perform similarly to or even better than more complex multivariate feature selection techniques with high-dimensional datasets. However, with more complex and smaller datasets multivariate methods slightly outperform univariate techniques.
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc16028264
- 003
- CZ-PrNML
- 005
- 20161021111315.0
- 007
- ta
- 008
- 161005s2015 xxu f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1016/j.compbiomed.2015.08.010 $2 doi
- 024 7_
- $a 10.1016/j.compbiomed.2015.08.010 $2 doi
- 035 __
- $a (PubMed)26327447
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxu
- 100 1_
- $a Drotár, P $u Department of Telecommunications, Brno University of Technology, Technická 12, 61200 Brno, Czech Republic. Electronic address: peter.drotar84@gmail.com.
- 245 13
- $a An experimental comparison of feature selection methods on two-class biomedical datasets / $c P. Drotár, J. Gazda, Z. Smékal,
- 520 9_
- $a Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is an essential step for knowledge discovery in many areas of biomedical informatics. The increased popularity of feature selection methods and their frequent utilisation raise challenging new questions about the interpretability and stability of feature selection techniques. In this study, we compared the behaviour of ten state-of-the-art filter methods for feature selection in terms of their stability, similarity, and influence on prediction performance. All of the experiments were conducted on eight two-class datasets from biomedical areas. While entropy-based feature selection appears to be the most stable, the feature selection techniques yielding the highest prediction performance are minimum redundance maximum relevance method and feature selection based on Bhattacharyya distance. In general, univariate feature selection techniques perform similarly to or even better than more complex multivariate feature selection techniques with high-dimensional datasets. However, with more complex and smaller datasets multivariate methods slightly outperform univariate techniques.
- 650 _2
- $a algoritmy $7 D000465
- 650 _2
- $a výpočetní biologie $x metody $7 D019295
- 650 _2
- $a databáze faktografické $7 D016208
- 650 _2
- $a lidé $7 D006801
- 650 _2
- $a statistické modely $7 D015233
- 650 _2
- $a multivariační analýza $7 D015999
- 650 _2
- $a sekvenční analýza hybridizací s uspořádaným souborem oligonukleotidů $x metody $7 D020411
- 650 _2
- $a Parkinsonova nemoc $x diagnóza $7 D010300
- 650 _2
- $a software $7 D012984
- 655 _2
- $a srovnávací studie $7 D003160
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 700 1_
- $a Gazda, J $u Department of Computers and Informatics, Technical University of Kosice, Letna 9, 0401 Kosice, Slovakia.
- 700 1_
- $a Smékal, Z $u Department of Telecommunications, Brno University of Technology, Technická 12, 61200 Brno, Czech Republic.
- 773 0_
- $w MED00001218 $t Computers in biology and medicine $x 1879-0534 $g Roč. 66, č. - (2015), s. 1-10
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/26327447 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y a $z 0
- 990 __
- $a 20161005 $b ABA008
- 991 __
- $a 20161021111724 $b ABA008
- 999 __
- $a ok $b bmc $g 1166578 $s 952894
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2015 $b 66 $c - $d 1-10 $e 20150824 $i 1879-0534 $m Computers in biology and medicine $n Comput Biol Med $x MED00001218
- LZP __
- $a Pubmed-20161005