Feature selection
Dotaz
Zobrazit nápovědu
Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is an essential step for knowledge discovery in many areas of biomedical informatics. The increased popularity of feature selection methods and their frequent utilisation raise challenging new questions about the interpretability and stability of feature selection techniques. In this study, we compared the behaviour of ten state-of-the-art filter methods for feature selection in terms of their stability, similarity, and influence on prediction performance. All of the experiments were conducted on eight two-class datasets from biomedical areas. While entropy-based feature selection appears to be the most stable, the feature selection techniques yielding the highest prediction performance are minimum redundance maximum relevance method and feature selection based on Bhattacharyya distance. In general, univariate feature selection techniques perform similarly to or even better than more complex multivariate feature selection techniques with high-dimensional datasets. However, with more complex and smaller datasets multivariate methods slightly outperform univariate techniques.
- MeSH
- algoritmy MeSH
- databáze faktografické MeSH
- lidé MeSH
- multivariační analýza MeSH
- Parkinsonova nemoc diagnóza MeSH
- sekvenční analýza hybridizací s uspořádaným souborem oligonukleotidů metody MeSH
- software MeSH
- statistické modely MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- srovnávací studie MeSH
The classification of bioimages plays an important role in several biological studies, such as subcellular localisation, phenotype identification and other types of histopathological examinations. The objective of the present study was to develop a computer-aided bioimage classification method for the classification of bioimages across nine diverse benchmark datasets. A novel algorithm was developed, which systematically fused the features extracted from nine different convolution neural network architectures. A systematic fusion of features boosts the performance of a classifier but at the cost of the high dimensionality of the fused feature set. Therefore, non-discriminatory and redundant features need to be removed from a high-dimensional fused feature set to improve the classification performance and reduce the time complexity. To achieve this aim, a method based on analysis of variance and evolutionary feature selection was developed to select an optimal set of discriminatory features from the fused feature set. The proposed method was evaluated on nine different benchmark datasets. The experimental results showed that the proposed method achieved superior performance, with a significant reduction in the dimensionality of the fused feature set for most bioimage datasets. The performance of the proposed feature selection method was better than that of some of the most recent and classical methods used for feature selection. Thus, the proposed method was desirable because of its superior performance and high compression ratio, which significantly reduced the computational complexity.
- MeSH
- algoritmy * MeSH
- neuronové sítě * MeSH
- Publikační typ
- časopisecké články MeSH
In gait stability analysis, patients suffering from dysfunction problems are impacted by shifts in their dynamic balance. Monitoring the patients' progress is important for allowing physicians and patients to observe the rehabilitation process accurately. In this study, we designed a new methodology for classifying gait disorders to quantify patients' progress. The dataset in this study includes 84 measurements of 37 patients based on a physician's opinion. In this study, the system, which includes a Kinect camera to observe and store the frames of patients walking down a hallway, a key-point detector to detect the skeletal key points, and an encoder transformer classifier network integrated with generator-discriminator networks (ET-GD), is designed to evaluate the classification of gait dysfunction. The detector extracts the skeletal key points of patients. After feature engineering, the selected high-level features are fed into the proposed neural network to analyse patient movement and perform the final evaluation of gait dysfunction. The proposed network is inspired by the 1D encoder transformer, which is integrated with two main networks: a network for classification and a network to generate fake output data similar to the input data. Furthermore, we used a discriminator structure to distinguish between the actual data (input) and fake data (generated data). Due to the multi-structural networks in the proposed method, multi-loss functions need to be optimised; this increases the accuracy of the encoder transformer classifier.
- MeSH
- analýza chůze MeSH
- chůze (způsob) * MeSH
- chůze MeSH
- lidé MeSH
- neuronové sítě MeSH
- pohybové poruchy * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
This paper focuses on wrapper-based feature selection for a 1-nearest neighbor classifier. We consider in particular the case of a small sample size with a few hundred instances, which is common in biomedical applications. We propose a technique for calculating the complete bootstrap for a 1-nearest-neighbor classifier (i.e., averaging over all desired test/train partitions of the data). The complete bootstrap and the complete cross-validation error estimate with lower variance are applied as novel selection criteria and are compared with the standard bootstrap and cross-validation in combination with three optimization techniques - sequential forward selection (SFS), binary particle swarm optimization (BPSO) and simplified social impact theory based optimization (SSITO). The experimental comparison based on ten datasets draws the following conclusions: for all three search methods examined here, the complete criteria are a significantly better choice than standard 2-fold cross-validation, 10-fold cross-validation and bootstrap with 50 trials irrespective of the selected output number of iterations. All the complete criterion-based 1NN wrappers with SFS search performed better than the widely-used FILTER and SIMBA methods. We also demonstrate the benefits and properties of our approaches on an important and novel real-world application of automatic detection of the subthalamic nucleus.
- MeSH
- teoretické modely MeSH
- velikost vzorku * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- validační studie MeSH
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
The extracellular subunit of the major histocompatibility complex MHCIIβ plays an important role in the recognition of pathogens and the initiation of the adaptive immune response of vertebrates. It is widely accepted that pathogen-mediated selection in combination with neutral micro-evolutionary forces (e.g. genetic drift) shape the diversity of MHCIIβ, but it has proved difficult to determine the relative effects of these forces. We evaluated the effect of genetic drift and balancing selection on MHCIIβ diversity in 12 small populations of Galápagos mockingbirds belonging to four different species, and one larger population of the Northern mockingbird from the continental USA. After genotyping MHCIIβ loci by high-throughput sequencing, we applied a correlational approach to explore the relationships between MHCIIβ diversity and population size by proxy of island size. As expected when drift predominates, we found a positive effect of population size on the number of MHCIIβ alleles present in a population. However, the number of MHCIIβ alleles per individual and number of supertypes were not correlated with population size. This discrepancy points to an interesting feature of MHCIIβ diversity dynamics: some levels of diversity might be shaped by genetic drift while others are independent and possibly maintained by balancing selection.
- MeSH
- genetická variace MeSH
- genetický drift * MeSH
- genotyp MeSH
- geny MHC třídy II * MeSH
- hustota populace MeSH
- ostrovy MeSH
- Passeriformes genetika MeSH
- populační genetika MeSH
- selekce (genetika) * MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Ekvádor MeSH
- ostrovy MeSH
... The new procedures for updating and disseminating the Model List 2 -- 3.1 Background 2 -- 3.2 Key features ... ... procedures 4 -- 3.3.1 Applications for additions 4 -- 3.3.2 Applications for deletions -- 3.3.3 Selection ...
WHO technical report series, ISSN 0512-3054 914
vi, 126 s. : tab. ; 24 cm
- MeSH
- esenciální léky normy MeSH
- informační služby o lécích MeSH
- směrnice jako téma MeSH
- spotřeba léčiv MeSH
- Publikační typ
- směrnice MeSH
- Konspekt
- Lékařské vědy. Lékařství
- NLK Obory
- veřejné zdravotnictví
- farmacie a farmakologie
- farmacie a farmakologie
- NLK Publikační typ
- publikace WHO
BACKGROUND: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. RESULTS: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets.
- MeSH
- algoritmy MeSH
- databáze faktografické MeSH
- lidé MeSH
- nádory slinivky břišní diagnóza genetika MeSH
- počítačová simulace MeSH
- proteomika metody MeSH
- reprodukovatelnost výsledků MeSH
- spektrometrie hmotnostní - ionizace laserem za účasti matrice metody MeSH
- strojové učení MeSH
- studie případů a kontrol MeSH
- teoretické modely MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH