Identification of active electrodes that record task-relevant neurophysiological activity is needed for clinical and industrial applications as well as for investigating brain functions. We developed an unsupervised, fully automated approach to classify active electrodes showing event-related intracranial EEG (iEEG) responses from 115 patients performing a free recall verbal memory task. Our approach employed new interpretable metrics that quantify spectral characteristics of the normalized iEEG signal based on power-in-band and synchrony measures. Unsupervised clustering of the metrics identified distinct sets of active electrodes across different subjects. In the total population of 11,869 electrodes, our method achieved 97% sensitivity and 92.9% specificity with the most efficient metric. We validated our results with anatomical localization revealing significantly greater distribution of active electrodes in brain regions that support verbal memory processing. We propose our machine-learning framework for objective and efficient classification and interpretation of electrophysiological signals of brain activities supporting memory and cognition.
- MeSH
- Algorithms MeSH
- Biomedical Engineering methods trends MeSH
- Datasets as Topic MeSH
- Electroencephalography methods MeSH
- Electrophysiological Phenomena MeSH
- Electrocorticography * methods MeSH
- Epilepsy diagnosis physiopathology psychology MeSH
- Evoked Potentials physiology MeSH
- Electrodes, Implanted * MeSH
- Cognition physiology MeSH
- Memory, Short-Term physiology MeSH
- Humans MeSH
- Brain Mapping methods MeSH
- Brain diagnostic imaging physiology MeSH
- Task Performance and Analysis * MeSH
- Retrospective Studies MeSH
- Sensitivity and Specificity MeSH
- Unsupervised Machine Learning * MeSH
- Verbal Behavior physiology MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Validation Study MeSH
Research in Artificial Intelligence (AI) has focused mostly on two extremes: either on small improvements in narrow AI domains, or on universal theoretical frameworks which are often uncomputable, or lack practical implementations. In this paper we attempt to follow a big picture view while also providing a particular theory and its implementation to present a novel, purposely simple, and interpretable hierarchical architecture. This architecture incorporates the unsupervised learning of a model of the environment, learning the influence of one's own actions, model-based reinforcement learning, hierarchical planning, and symbolic/sub-symbolic integration in general. The learned model is stored in the form of hierarchical representations which are increasingly more abstract, but can retain details when needed. We demonstrate the universality of the architecture by testing it on a series of diverse environments ranging from audio/visual compression to discrete and continuous action spaces, to learning disentangled representations.
The academic curriculum has shown to promote sedentary behavior in college students. This study aimed to profile the physical fitness of physical education majors using unsupervised machine learning and to identify the differences between sexes, academic years, socioeconomic strata, and the generated profiles. A total of 542 healthy and physically active students (445 males, 97 females; 19.8 [2.2] years; 66.0 [10.3] kg; 169.5 [7.8] cm) participated in this cross-sectional study. Their indirect VO2max (Cooper and Shuttle-Run 20 m tests), lower-limb power (horizontal jump), sprint (30 m), agility (shuttle run), and flexibility (sit-and-reach) were assessed. The participants were profiled using clustering algorithms after setting the optimal number of clusters through an internal validation using R packages. Non-parametric tests were used to identify the differences (p < 0.05). The higher percentage of the population were freshmen (51.4%) and middle-income (64.0%) students. Seniors and juniors showed a better physical fitness than first-year students. No significant differences were found between their socioeconomic strata (p > 0.05). Two profiles were identified using hierarchical clustering (Cluster 1 = 318 vs. Cluster 2 = 224). The matching analysis revealed that physical fitness explained the variation in the data, with Cluster 2 as a sex-independent and more physically fit group. All variables differed significantly between the sexes (except the body mass index [p = 0.218]) and the generated profiles (except stature [p = 0.559] and flexibility [p = 0.115]). A multidimensional analysis showed that the body mass, cardiorespiratory fitness, and agility contributed the most to the data variation so that they can be used as profiling variables. This profiling method accurately identified the relevant variables to reinforce exercise recommendations in a low physical performance and overweight majors.
Current studies of gene × air pollution interaction typically seek to identify unknown heritability of common complex illnesses arising from variability in the host's susceptibility to environmental pollutants of interest. Accordingly, a single component generalized linear models are often used to model the risk posed by an environmental exposure variable of interest in relation to a priori determined DNA variants. However, reducing the phenotypic heterogeneity may further optimize such approach, primarily represented by the modeled DNA variants. Here, we reduce phenotypic heterogeneity of asthma severity, and also identify single nucleotide polymorphisms (SNP) associated with phenotype subgroups. Specifically, we first apply an unsupervised learning algorithm method and a non-parametric regression to find a biclustering structure of children according to their allergy and asthma severity. We then identify a set of SNPs most closely correlated with each sub-group. We subsequently fit a logistic regression model for each group against the healthy controls using benzo[a]pyrene (B[a]P) as a representative airborne carcinogen. Application of such approach in a case-control data set shows that SNP clustering may help to partly explain heterogeneity in children's asthma susceptibility in relation to ambient B[a]P concentration with greater efficiency.
- MeSH
- Algorithms MeSH
- Benzo(a)pyrene toxicity MeSH
- Asthma chemically induced genetics MeSH
- Child MeSH
- Genetic Predisposition to Disease * MeSH
- Gene-Environment Interaction MeSH
- Polymorphism, Single Nucleotide MeSH
- Air Pollutants toxicity MeSH
- Humans MeSH
- Multifactorial Inheritance * MeSH
- Statistics as Topic MeSH
- Unsupervised Machine Learning MeSH
- Case-Control Studies MeSH
- Environmental Exposure adverse effects MeSH
- Air Pollution adverse effects MeSH
- Check Tag
- Child MeSH
- Humans MeSH
- Male MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
A major challenge in cancer treatment is predicting the clinical response to anti-cancer drugs on a personalized basis. The success of such a task largely depends on the ability to develop computational resources that integrate big "omic" data into effective drug-response models. Machine learning is both an expanding and an evolving computational field that holds promise to cover such needs. Here we provide a focused overview of: 1) the various supervised and unsupervised algorithms used specifically in drug response prediction applications, 2) the strategies employed to develop these algorithms into applicable models, 3) data resources that are fed into these frameworks and 4) pitfalls and challenges to maximize model performance. In this context we also describe a novel in silico screening process, based on Association Rule Mining, for identifying genes as candidate drivers of drug response and compare it with relevant data mining frameworks, for which we generated a web application freely available at: https://compbio.nyumc.org/drugs/. This pipeline explores with high efficiency large sample-spaces, while is able to detect low frequency events and evaluate statistical significance even in the multidimensional space, presenting the results in the form of easily interpretable rules. We conclude with future prospects and challenges of applying machine learning based drug response prediction in precision medicine.
- MeSH
- Data Mining * MeSH
- Humans MeSH
- Neoplasms drug therapy MeSH
- Computer Simulation MeSH
- Machine Learning * MeSH
- Treatment Outcome MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Review MeSH
- Research Support, N.I.H., Extramural MeSH
Acute heart failure (AHF) is a life-threatening, heterogeneous disease requiring urgent diagnosis and treatment. The clinical severity and medical procedures differ according to a complex interplay between the deterioration cause, underlying cardiac substrate, and comorbidities. This study aimed to analyze the natural phenotypic heterogeneity of the AHF population and evaluate the possibilities offered by clustering (unsupervised machine-learning technique) in a medical data assessment. We evaluated data from 381 AHF patients. Sixty-three clinical and biochemical features were assessed at the admission of the patients and were included in the analysis after the preprocessing. The K-medoids algorithm was implemented to create the clusters, and optimization, based on the Davies-Bouldin index, was used. The clustering was performed while blinded to the outcome. The outcome associations were evaluated using the Kaplan-Meier curves and Cox proportional-hazards regressions. The algorithm distinguished six clusters that differed significantly in 58 variables concerning i.e., etiology, clinical status, comorbidities, laboratory parameters and lifestyle factors. The clusters differed in terms of the one-year mortality (p = 0.002). Using the clustering techniques, we extracted six phenotypes from AHF patients with distinct clinical characteristics and outcomes. Our results can be valuable for future trial constructions and customized treatment.
- Publication type
- Journal Article MeSH
BACKGROUND: Identification of coordinately regulated genes according to the level of their expression during the time course of a process allows for discovering functional relationships among genes involved in the process. RESULTS: We present a single class classification method for the identification of genes of similar function from a gene expression time series. It is based on a parallel genetic algorithm which is a supervised computer learning method exploiting prior knowledge of gene function to identify unknown genes of similar function from expression data. The algorithm was tested with a set of randomly generated patterns; the results were compared with seven other classification algorithms including support vector machines. The algorithm avoids several problems associated with unsupervised clustering methods, and it shows better performance then the other algorithms. The algorithm was applied to the identification of secondary metabolite gene clusters of the antibiotic-producing eubacterium Streptomyces coelicolor. The algorithm also identified pathways associated with transport of the secondary metabolites out of the cell. We used the method for the prediction of the functional role of particular ORFs based on the expression data. CONCLUSION: Through analysis of a time series of gene expression, the algorithm identifies pathways which are directly or indirectly associated with genes of interest, and which are active during the time course of the experiment.
BACKGROUND AND OBJECTIVES: Recent studies fueled doubts as to whether all currently defined central disorders of hypersomnolence are stable entities, especially narcolepsy type 2 and idiopathic hypersomnia. New reliable biomarkers are needed, and the question arises of whether current diagnostic criteria of hypersomnolence disorders should be reassessed. The main aim of this data-driven observational study was to see whether data-driven algorithms would segregate narcolepsy type 1 and identify more reliable subgrouping of individuals without cataplexy with new clinical biomarkers. METHODS: We used agglomerative hierarchical clustering, an unsupervised machine learning algorithm, to identify distinct hypersomnolence clusters in the large-scale European Narcolepsy Network database. We included 97 variables, covering all aspects of central hypersomnolence disorders such as symptoms, demographics, objective and subjective sleep measures, and laboratory biomarkers. We specifically focused on subgrouping of patients without cataplexy. The number of clusters was chosen to be the minimal number for which patients without cataplexy were put in distinct groups. RESULTS: We included 1,078 unmedicated adolescents and adults. Seven clusters were identified, of which 4 clusters included predominantly individuals with cataplexy. The 2 most distinct clusters consisted of 158 and 157 patients, were dominated by those without cataplexy, and among other variables, significantly differed in presence of sleep drunkenness, subjective difficulty awakening, and weekend-week sleep length difference. Patients formally diagnosed as having narcolepsy type 2 and idiopathic hypersomnia were evenly mixed in these 2 clusters. DISCUSSION: Using a data-driven approach in the largest study on central disorders of hypersomnolence to date, our study identified distinct patient subgroups within the central disorders of hypersomnolence population. Our results contest inclusion of sleep-onset REM periods in diagnostic criteria for people without cataplexy and provide promising new variables for reliable diagnostic categories that better resemble different patient phenotypes. Cluster-guided classification will result in a more solid hypersomnolence classification system that is less vulnerable to instability of single features.
- MeSH
- Idiopathic Hypersomnia * diagnosis MeSH
- Cataplexy * diagnosis MeSH
- Humans MeSH
- Adolescent MeSH
- Narcolepsy * diagnosis drug therapy MeSH
- Disorders of Excessive Somnolence * diagnosis epidemiology MeSH
- Cluster Analysis MeSH
- Check Tag
- Humans MeSH
- Adolescent MeSH
- Publication type
- Journal Article MeSH
- Observational Study MeSH
Introduction. The development of inertial sensors in motion capture systems enables precise measurement of motor symptoms in Parkinson's disease (PD). The type of physical activities performed by the PD participants is an important factor to compute objective scores for specific motor symptoms of the disease. The goal of this study is to propose an approach to automatically detect the physical activities over a period time and segment the time stamps for such detected activities. Methods. A wearable motion capture sensor system using inertial measurement units (IMUs) was used for data collection. Data from the sensors attached to the shoulders, elbows, and wrists were utilized for detecting and segmenting the activities. An unsupervised machine learning algorithm was employed to extract suitable features from the appropriate sensors and classify the data points to the corresponding activity group. Results. The performance of the proposed technique was evaluated with respect to the manually labeled and segmented activities. The experimental results reveal that the proposed auto detection technique – by obtaining high average scores of accuracy (96%), precision (96%), and recall (98%) – is able to effectively detect the activities during the sitting task and segment them to the proper time stamps.
- MeSH
- Algorithms MeSH
- Equipment Design MeSH
- Diagnosis, Computer-Assisted * instrumentation MeSH
- Electrical Equipment and Supplies MeSH
- Humans MeSH
- Monitoring, Physiologic MeSH
- Parkinson Disease * diagnosis physiopathology MeSH
- Task Performance and Analysis MeSH
- Signal Processing, Computer-Assisted MeSH
- Motor Activity * MeSH
- Posture MeSH
- Reproducibility of Results MeSH
- Pattern Recognition, Automated MeSH
- Data Accuracy MeSH
- Machine Learning MeSH
- Severity of Illness Index MeSH
- Check Tag
- Humans MeSH
- Publication type
- Research Support, Non-U.S. Gov't MeSH