-
Je něco špatně v tomto záznamu ?
Wrapper feature selection for small sample size data driven by complete error estimates
M. Macaš, L. Lhotská, E. Bakstein, D. Novák, J. Wild, T. Sieger, P. Vostatek, R. Jech,
Jazyk angličtina Země Irsko
Typ dokumentu časopisecké články, práce podpořená grantem, validační studie
- MeSH
- teoretické modely MeSH
- velikost vzorku * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- validační studie MeSH
This paper focuses on wrapper-based feature selection for a 1-nearest neighbor classifier. We consider in particular the case of a small sample size with a few hundred instances, which is common in biomedical applications. We propose a technique for calculating the complete bootstrap for a 1-nearest-neighbor classifier (i.e., averaging over all desired test/train partitions of the data). The complete bootstrap and the complete cross-validation error estimate with lower variance are applied as novel selection criteria and are compared with the standard bootstrap and cross-validation in combination with three optimization techniques - sequential forward selection (SFS), binary particle swarm optimization (BPSO) and simplified social impact theory based optimization (SSITO). The experimental comparison based on ten datasets draws the following conclusions: for all three search methods examined here, the complete criteria are a significantly better choice than standard 2-fold cross-validation, 10-fold cross-validation and bootstrap with 50 trials irrespective of the selected output number of iterations. All the complete criterion-based 1NN wrappers with SFS search performed better than the widely-used FILTER and SIMBA methods. We also demonstrate the benefits and properties of our approaches on an important and novel real-world application of automatic detection of the subthalamic nucleus.
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc13012722
- 003
- CZ-PrNML
- 005
- 20130410102036.0
- 007
- ta
- 008
- 130404s2012 ie f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1016/j.cmpb.2012.02.006 $2 doi
- 035 __
- $a (PubMed)22472029
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a ie
- 100 1_
- $a Macaš, Martin $u Czech Technical University, Faculty of Electrical Engineering, Department of Cybernetics, Karlovo Namesti 13, 12135 Prague, Czech Republic.
- 245 10
- $a Wrapper feature selection for small sample size data driven by complete error estimates / $c M. Macaš, L. Lhotská, E. Bakstein, D. Novák, J. Wild, T. Sieger, P. Vostatek, R. Jech,
- 520 9_
- $a This paper focuses on wrapper-based feature selection for a 1-nearest neighbor classifier. We consider in particular the case of a small sample size with a few hundred instances, which is common in biomedical applications. We propose a technique for calculating the complete bootstrap for a 1-nearest-neighbor classifier (i.e., averaging over all desired test/train partitions of the data). The complete bootstrap and the complete cross-validation error estimate with lower variance are applied as novel selection criteria and are compared with the standard bootstrap and cross-validation in combination with three optimization techniques - sequential forward selection (SFS), binary particle swarm optimization (BPSO) and simplified social impact theory based optimization (SSITO). The experimental comparison based on ten datasets draws the following conclusions: for all three search methods examined here, the complete criteria are a significantly better choice than standard 2-fold cross-validation, 10-fold cross-validation and bootstrap with 50 trials irrespective of the selected output number of iterations. All the complete criterion-based 1NN wrappers with SFS search performed better than the widely-used FILTER and SIMBA methods. We also demonstrate the benefits and properties of our approaches on an important and novel real-world application of automatic detection of the subthalamic nucleus.
- 650 _2
- $a teoretické modely $7 D008962
- 650 12
- $a velikost vzorku $7 D018401
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 655 _2
- $a validační studie $7 D023361
- 700 1_
- $a Lhotská, Lenka $u -
- 700 1_
- $a Bakstein, Eduard $u -
- 700 1_
- $a Novák, Daniel $u -
- 700 1_
- $a Wild, Jiří $u -
- 700 1_
- $a Sieger, Tomáš $u -
- 700 1_
- $a Vostatek, Pavel $u -
- 700 1_
- $a Jech, Robert $u -
- 773 0_
- $w MED00001214 $t Computer methods and programs in biomedicine $x 1872-7565 $g Roč. 108, č. 1 (2012), s. 138-50
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/22472029 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y a $z 0
- 990 __
- $a 20130404 $b ABA008
- 991 __
- $a 20130410102305 $b ABA008
- 999 __
- $a ok $b bmc $g 975920 $s 811003
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2012 $b 108 $c 1 $d 138-50 $i 1872-7565 $m Computer methods and programs in biomedicine $n Comput Methods Programs Biomed $x MED00001214
- LZP __
- $a Pubmed-20130404