Wrapper feature selection for small sample size data driven by complete error estimates

. 2012 Oct ; 108 (1) : 138-50. [epub] 20120401

Jazyk angličtina Země Irsko Médium print-electronic

Typ dokumentu časopisecké články, práce podpořená grantem, validační studie

Perzistentní odkaz   https://www.medvik.cz/link/pmid22472029
Odkazy

PubMed 22472029
DOI 10.1016/j.cmpb.2012.02.006
PII: S0169-2607(12)00058-2
Knihovny.cz E-zdroje

This paper focuses on wrapper-based feature selection for a 1-nearest neighbor classifier. We consider in particular the case of a small sample size with a few hundred instances, which is common in biomedical applications. We propose a technique for calculating the complete bootstrap for a 1-nearest-neighbor classifier (i.e., averaging over all desired test/train partitions of the data). The complete bootstrap and the complete cross-validation error estimate with lower variance are applied as novel selection criteria and are compared with the standard bootstrap and cross-validation in combination with three optimization techniques - sequential forward selection (SFS), binary particle swarm optimization (BPSO) and simplified social impact theory based optimization (SSITO). The experimental comparison based on ten datasets draws the following conclusions: for all three search methods examined here, the complete criteria are a significantly better choice than standard 2-fold cross-validation, 10-fold cross-validation and bootstrap with 50 trials irrespective of the selected output number of iterations. All the complete criterion-based 1NN wrappers with SFS search performed better than the widely-used FILTER and SIMBA methods. We also demonstrate the benefits and properties of our approaches on an important and novel real-world application of automatic detection of the subthalamic nucleus.

Citace poskytuje Crossref.org

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...