Wrapper feature selection for small sample size data driven by complete error estimates
Jazyk angličtina Země Irsko Médium print-electronic
Typ dokumentu časopisecké články, práce podpořená grantem, validační studie
PubMed
22472029
DOI
10.1016/j.cmpb.2012.02.006
PII: S0169-2607(12)00058-2
Knihovny.cz E-zdroje
- MeSH
- teoretické modely MeSH
- velikost vzorku * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- validační studie MeSH
This paper focuses on wrapper-based feature selection for a 1-nearest neighbor classifier. We consider in particular the case of a small sample size with a few hundred instances, which is common in biomedical applications. We propose a technique for calculating the complete bootstrap for a 1-nearest-neighbor classifier (i.e., averaging over all desired test/train partitions of the data). The complete bootstrap and the complete cross-validation error estimate with lower variance are applied as novel selection criteria and are compared with the standard bootstrap and cross-validation in combination with three optimization techniques - sequential forward selection (SFS), binary particle swarm optimization (BPSO) and simplified social impact theory based optimization (SSITO). The experimental comparison based on ten datasets draws the following conclusions: for all three search methods examined here, the complete criteria are a significantly better choice than standard 2-fold cross-validation, 10-fold cross-validation and bootstrap with 50 trials irrespective of the selected output number of iterations. All the complete criterion-based 1NN wrappers with SFS search performed better than the widely-used FILTER and SIMBA methods. We also demonstrate the benefits and properties of our approaches on an important and novel real-world application of automatic detection of the subthalamic nucleus.
Citace poskytuje Crossref.org