Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection
Typ dokumentu časopisecké články, přehledy
PubMed
33737920
PubMed Central
PMC7962872
DOI
10.3389/fmicb.2021.634511
Knihovny.cz E-zdroje
- Klíčová slova
- biomarker identification, disease prediction, feature selection, machine learning, microbiome,
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
Bioinformatics Research Unit Riga Stradins University Riga Latvia
Centro de Matemática e Aplicações FCT UNL Caparica Portugal
Colorectal Cancer Group Institut de Recerca Biomedica de Bellvitge Barcelona Spain
Consortium for Biomedical Research in Epidemiology and Public Health Barcelona Spain
Department of Clinical Science University of Bergen Bergen Norway
Department of Clinical Sciences Faculty of Medicine University of Barcelona Barcelona Spain
Department of Computer Networks and Systems Silesian University of Technology Gliwice Poland
Department of Computer Science University of Crete Heraklion Greece
Department of Computing University of Turku Turku Finland
Department of Information Systems Zefat Academic College Zefat Israel
Department of Microbiology University of Innsbruck Innsbruck Austria
EPIUnit Instituto de Saúde Pública da Universidade do Porto Porto Portugal
Faculty of Computer Science and Engineering Ss Cyril and Methodius University Skopje North Macedonia
Faculty of Mathematics and Computer Science Nicolaus Copernicus University Toruń Poland
Faculty of Technical Sciences University of Novi Sad Novi Sad Serbia
Galilee Digital Health Research Center Zefat Academic College Zefat Israel
Institute of Genomics Estonian Genome Centre University of Tartu Tartu Estonia
Institute of Molecular and Cell Biology University of Tartu Tartu Estonia
NOVA Laboratory for Computer Science and Informatics FCT UNL Caparica Portugal
Oncology Data Analytics Program Catalan Institute of Oncology Barcelona Spain
School of Microbiology and APC Microbiome Ireland University College Cork Cork Ireland
South West University Neofit Rilski Blagoevgrad Bulgaria
Université Paris Saclay INRAE MGP Jouy en Josas France
University Sarajevo School of Science and Technology Sarajevo Bosnia and Herzegovina
Zobrazit více v PubMed
Ai D., Pan H., Han R., Li X., Liu G., Xia L. C. (2019). Using decision tree aggregation with random forest model to identify gut microbes associated with colorectal cancer. PubMed DOI PMC
Aitchison J. (1986).
Almeida A., Nayfach S., Boland M., Strozzi F. (2021). A unified catalog of 204,938 reference genomes from the human gut microbiome. PubMed DOI PMC
Arango-Argoty G., Garner E., Pruden A., Heath L. S., Vikesland P., Zhang L. (2018). DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. PubMed PMC
Arksey H., O’Malley L. (2005). Scoping studies: towards a methodological framework. DOI
Asgari E., Garakani K., McHardy A. C., Mofrad M. R. K. (2019). MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. PubMed DOI PMC
Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Cherry J. M., et al. (2000). Gene ontology: tool for the unification of biology. The gene ontology consortium. PubMed DOI PMC
Bai J., Hu Y., Bruner D. W. (2019). Composition of gut microbiota and its association with body mass index and lifestyle factors in a cohort of 7-18 years old children from the American Gut Project. PubMed DOI
Baldini F., Heinken A., Heirendt L., Magnusdottir S., Fleming R. M. T., Thiele I. (2019). The Microbiome Modeling Toolbox: from microbial interactions to personalized microbial communities. PubMed DOI PMC
Baxter N. T., Ruffin M. T., Rogers M. A., Schloss P. D. (2016). Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. PubMed DOI PMC
Beck D., Foster J. A. (2014). Machine learning techniques accurately classify microbial communities by bacterial vaginosis characteristics. PubMed DOI PMC
Beck D., Foster J. A. (2015). Machine learning classifiers provide insight into the relationship between microbial communities and bacterial vaginosis. PubMed DOI PMC
Berglund F., Marathe N. P., Österlund T., Bengtsson-Palme J., Kotsakis S., Flach C.-F., et al. (2017). Identification of 76 novel B1 metallo-β-lactamases through large-scale screening of genomic and metagenomic data. PubMed DOI PMC
Blaxter M., Mann J., Chapman T., Thomas F., Whitton C., Floyd R., et al. (2005). Defining operational taxonomic units using DNA barcode data. PubMed DOI PMC
Bonder M. J., Abeln S., Zaura E., Brandt B. W. (2012). Comparing clustering and pre-processing in taxonomy analysis. PubMed DOI
Borboudakis G., Tsamardinos I. (2019). Forward-backward selection with early dropping.
Borodulin K., Tolonen H., Jousilahti P., Jula A., Juolevi A., Koskinen S., et al. (2018). Cohort profile: the national FINRISK STUDY. PubMed DOI
Braun T., Di Segni A., BenShoshan M., Neuman S., Levhar N., Bubis M., et al. (2019). Individualized dynamics in the gut microbiota precede Crohn’s disease flares. PubMed DOI
Breiman L. (2001). Random forests. DOI
Cai Y., Gu H., Kenney T. (2017). Learning microbial community structures with supervised and unsupervised non-negative matrix factorization. PubMed DOI PMC
Cai Y., Sun Y. (2011). ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. PubMed DOI PMC
Callahan B. J., McMurdie P. J., Holmes S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. PubMed DOI PMC
Caporaso J. G., Lauber C. L., Costello E. K., Berg-Lyons D., Gonzalez A., Stombaugh J., et al. (2011). Moving pictures of the human microbiome. PubMed DOI PMC
Chassagnon G., Vakalopolou M., Paragios N., Revel M.-P. (2020). Deep learning: definition and perspectives for thoracic imaging. PubMed DOI
Chen L., Zhang Y. H., Huang T., Cai Y. D. (2016). Gene expression profiling gut microbiota in different races of humans. PubMed DOI PMC
Chicco D., Jurman G. (2020). The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. PubMed DOI PMC
Chong J., Liu P., Zhou G., Xia J. (2020). Using microbiomeanalyst for comprehensive statistical, functional, and meta-analysis of microbiome data. PubMed DOI
Cortes C., Vapnik V. (1995). Support-vector networks. DOI
Costello E. K., Lauber C. L., Hamady M., Fierer N., Gordon J. I., Knight R. (2009). Bacterial community variation in human body habitats across space and time. PubMed DOI PMC
Cui H., Zhang X. (2013). Alignment-free supervised classification of metagenomes by recursive SVM. PubMed DOI PMC
David L. A., Materna A. C., Friedman J., Campos-Baptista M. I., Blackburn M. C., Perrotta A., et al. (2014). Host lifestyle affects human microbiota on daily timescales. PubMed DOI PMC
Díez López C., Vidaki A., Ralf A., Montiel González D., Radjabzadeh D., Kraaij R., et al. (2019). Novel taxonomy-independent deep learning microbiome approach allows for accurate classification of different forensically relevant human epithelial materials. PubMed DOI
DiGiulio D. B., Callahan B. J., McMurdie P. J., Costello E. K., Lyell D. J., Robaczewska A., et al. (2015). Temporal and spatial variation of the human microbiota during pregnancy. PubMed DOI PMC
Ditzler G., Morrison J. C., Lan Y., Rosen G. L. (2015). Fizzy: feature subset selection for metagenomics. PubMed DOI PMC
Douglas G. M., Hansen R., Jones C. M. A., Dunn K. A., Comeau A. M., Bielawski J. P., et al. (2018). Multi-omics differentially classify disease state and treatment outcome in pediatric Crohn’s disease. PubMed DOI PMC
Duvallet C., Gibbons S. M., Gurry T., Irizarry R. A., Alm E. J. (2017). Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. PubMed DOI PMC
Eck A., Zintgraf L. M., de Groot E. F. J., de Meij T. G. J., Cohen T. S., Savelkoul P. H. M., et al. (2017). Interpretation of microbiota-based diagnostics by explaining individual classifier decisions. PubMed DOI PMC
Edgar R. C. (2010). Search and clustering orders of magnitude faster than BLAST. PubMed DOI
Elekwachi C. O., Wang Z., Wu X., Rabee A., Forster R. J. (2017). Total rRNA-Seq analysis gives insight into bacterial, fungal, protozoal and archaeal communities in the rumen using an optimized rna isolation method. PubMed DOI PMC
Escobar J. S., Klotz B., Valdes B. E., Agudelo G. M. (2014). The gut microbiota of colombians differs from that of Americans, Europeans and Asians. PubMed DOI PMC
Fabijanić M., Vlahoviček K. (2016). Big data, evolution, and metagenomes: predicting disease from gut microbiota codon usage profiles. PubMed DOI
Falony G., Joossens M., Vieira-Silva S., Wang J., Darzi Y., Faust K., et al. (2016). Population-level analysis of gut microbiome variation. PubMed DOI
Faust K., Lahti L., Gonze D., de Vos W. M., Raes J. (2015). Metagenomics meets time series analysis: unraveling microbial community dynamics. PubMed DOI
Faust K., Sathirapongsasuti J. F., Izard J., Segata N., Gevers D., Raes J., et al. (2012). Microbial co-occurrence relationships in the human microbiome. PubMed DOI PMC
Feng Q., Liang S., Jia H., Stadlmayr A., Tang L., Lan Z., et al. (2015). Gut microbiome development along the colorectal adenoma-carcinoma sequence. PubMed DOI
Filzmoser P., Hron K., Templ M. (2018).
Fioravanti D., Giarratano Y., Maggio V., Agostinelli C., Chierici M., Jurman G., et al. (2018). Phylogenetic convolutional neural networks in metagenomics. PubMed DOI PMC
Flemer B., Warren R. D., Barrett M. P., Cisek K., Das A., Jeffery I. B., et al. (2017). The oral microbiota in colorectal cancer is distinctive and predictive. PubMed DOI PMC
Franzosa E. A., McIver L. J., Rahnavard G., Thompson L. R., Schirmer M., Weingart G., et al. (2018). Species-level functional profiling of metagenomes and metatranscriptomes. PubMed DOI PMC
Friedman J. H. (2001). Greedy function approximation: a gradient boosting machine. DOI
Fukui H., Nishida A., Matsuda S., Kira F., Watanabe S., Kuriyama M., et al. (2020). Usefulness of machine learning-based gut microbiome analysis for identifying patients with irritable bowels syndrome. PubMed DOI PMC
Gajer P., Brotman R. M., Bai G., Sakamoto J., Schütte U. M. E., Zhong X., et al. (2012). Temporal dynamics of the human vaginal microbiota. PubMed DOI PMC
Gentleman R. C., Carey V. J., Bates D. M., Bolstad B., Dettling M., Dudoit S., et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. PubMed DOI PMC
Gevers D., Kugathasan S., Denson L. A., Vázquez-Baeza Y., Van Treuren W., Ren B., et al. (2014). The treatment-naive microbiome in new-onset Crohn’s disease. PubMed DOI PMC
Gilbert J. A., Blaser M. J., Caporaso J. G., Jansson J. K., Lynch S. V., Knight R. (2018). Current understanding of the human microbiome. PubMed DOI PMC
Gloor G. B., Macklaim J. M., Pawlowsky-Glahn V., Egozcue J. J. (2017). Microbiome datasets are compositional: and this is not optional. PubMed DOI PMC
Gonzalez A., Navas-Molina J. A., Kosciolek T., McDonald D., Vázquez-Baeza Y., Ackermann G., et al. (2018). Qiita: rapid, web-enabled microbiome meta-analysis. PubMed DOI PMC
Goodrich J. K., Waters J. L., Poole A. C., Sutter J. L., Koren O., Blekhman R., et al. (2014). Human genetics shape the gut microbiome. PubMed DOI PMC
Gupta A., Dhakan D. B., Maji A., Saxena R., Vishnu Prasoodanan P. K., Mahajan S., et al. (2019). Association of Flavonifractor plautii, a flavonoid-degrading bacterium, with the gut microbiome of colorectal cancer patients in India. PubMed DOI PMC
Hacılar H., Nalbantoğlu O. U., Bakir-Güngör B. (2018). “Machine learning analysis of inflammatory bowel disease-associated metagenomics dataset,” in
Hagopian W. A., Erlich H., Lernmark A., Rewers M., Ziegler A. G., Simell O., et al. (2011). The environmental determinants of diabetes in the young (TEDDY): genetic criteria and international diabetes risk screening of 421 000 infants. PubMed DOI PMC
Halfvarson J., Brislawn C. J., Lamendella R., Vázquez-Baeza Y., Walters W. A., Bramer L. M., et al. (2017). Dynamics of the human gut microbiome in inflammatory bowel disease. PubMed DOI PMC
Hansen R., Russell R. K., Reiff C., Louis P., McIntosh F., Berry S. H., et al. (2012). Microbiota of de-novo pediatric IBD: increased PubMed DOI
Hanski I., von Hertzen L., Fyhrquist N., Koskinen K., Torppa K., Laatikainen T., et al. (2012). Environmental biodiversity, human microbiota, and allergy are interrelated. PubMed DOI PMC
Hastie T., Tibshirani R., Friedman J. (2009). DOI
Heirendt L., Arreckx S., Pfau T., Mendoza S. N., Richelle A., Heinken A., et al. (2019). Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. PubMed DOI PMC
Hoffman J. I. E. (2019). “Logistic regression,” in DOI
Hollister E. B., Oezguen N., Chumpitazi B. P., Luna R. A., Weidler E. M., Rubio-Gonzales M., et al. (2019). Leveraging human microbiome features to diagnose and stratify children with irritable bowel syndrome. PubMed DOI PMC
Holmes I., Harris K., Quince C. (2012). Dirichlet multinomial mixtures: generative models for microbial metagenomics. PubMed DOI PMC
Hughes D. A., Bacigalupe R., Wang J., Rühlemann M. C., Tito R. Y., Falony G., et al. (2020). Genome-wide associations of human gut microbiome variation and implications for causal inference analyses. PubMed DOI PMC
Human Microbiome Project Consortium (2012). Structure, function and diversity of the healthy human microbiome. PubMed DOI PMC
Ioannidis J. P. A. (2008). Why most discovered true associations are inflated. PubMed DOI
Jang B.-S., Chang J. H., Chie E. K., Kim K., Park J. W., Kim M. J., et al. (2020). Gut microbiome composition is associated with a pathologic response after preoperative chemoradiation in patients with rectal cancer. PubMed DOI
Jensen L. J., Julien P., Kuhn M., von Mering C., Muller J., Doerks T., et al. (2007). eggNOG: automated construction and annotation of orthologous groups of genes. PubMed DOI PMC
Jiang P., Green S. J., Chlipala G. E., Turek F. W., Vitaterna M. H. (2019). Reproducible changes in the gut microbiome suggest a shift in microbial and host metabolism during spaceflight. PubMed DOI PMC
Johnson H. R., Trinidad D. D., Guzman S., Khan Z., Parziale J. V., DeBruyn J. M., et al. (2016). A machine learning approach for using the postmortem skin microbiome to estimate the postmortem interval. PubMed DOI PMC
Kanehisa M., Goto S. (2000). KEGG: kyoto encyclopedia of genes and genomes. PubMed DOI PMC
Kanehisa M., Goto S., Kawashima S., Okuno Y., Hattori M. (2004). The KEGG resource for deciphering the genome. PubMed DOI PMC
Kashyap P. C., Chia N., Nelson H., Segal E., Elinav E. (2017). Microbiome at the frontier of personalized medicine. PubMed DOI PMC
Kharrat N., Assidi M., Abu-Elmagd M., Pushparaj P. N., Alkhaldy A., Arfaoui L., et al. (2019). Data mining analysis of human gut microbiota links PubMed DOI PMC
Knights D., Costello E. K., Knight R. (2011). Supervised classification of human microbiota. PubMed DOI
Koohi-Moghadam M., Borad M. J., Tran N. L., Swanson K. R., Boardman L. A., Sun H., et al. (2019). MetaMarker: a pipeline for de novo discovery of novel metagenomic biomarkers. PubMed DOI PMC
Koren O., Knights D., Gonzalez A., Waldron L., Segata N., Knight R., et al. (2013). A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets. PubMed DOI PMC
Kuczynski J., Stombaugh J., Walters W. A., González A., Gregory Caporaso J., Knight R. (2012). Using QIIME to Analyze 16S rRNA gene sequences from microbial communities. PubMed DOI PMC
La Rosa P. S., Warner B. B., Zhou Y., Weinstock G. M., Sodergren E., Hall-Moore C. M., et al. (2014). Patterned progression of bacterial populations in the premature infant gut. PubMed DOI PMC
Lagani V., Athineou G., Farcomeni A., Tsagris M., Tsamardinos I. (2017). Feature selection with the R Package MXM: discovering statistically equivalent feature subsets. DOI
Lahti L., Salonen A., Kekkonen R. A., Salojarvi J., Jalanka-Tuovinen J., Palva A., et al. (2013). Associations between the human intestinal microbiota, PubMed DOI PMC
Lakin S. M., Dean C., Noyes N. R., Dettenwanger A., Ross A. S., Doster E., et al. (2017). MEGARes: an antimicrobial resistance database for high throughput sequencing. PubMed DOI PMC
Langille M. G. I., Zaneveld J., Gregory Caporaso J., McDonald D., Knights D., Reyes J. A., et al. (2013). Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. PubMed DOI PMC
LaPierre N., Ju C. J.-T., Zhou G., Wang W. (2019). MetaPheno: a critical evaluation of deep learning and machine learning in metagenome-based disease prediction. PubMed DOI PMC
Larsen P. E., Dai Y. (2015). Metabolome of human gut microbiome is predictive of host dysbiosis. PubMed DOI PMC
Le V., Quinn T. P., Tran T., Venkatesh S. (2020). Deep in the bowel: highly interpretable neural encoder-decoder networks predict gut metabolites from gut microbiome. PubMed DOI PMC
Le Goallec A., Tierney B. T., Luber J. M., Cofer E. M., Kostic A. D., Patel C. J. (2020). A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type. PubMed DOI PMC
Li J., Jia H., Cai X., Zhong H., Feng Q., Sunagawa S., et al. (2014). An integrated catalog of reference genes in the human gut microbiome. PubMed DOI
Li R., Zhu H., Ruan J., Qian W., Fang X., Shi Z., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. PubMed DOI PMC
Liu Y., Guo J., Zhu H. (2011). “Gene prediction in metagenomic fragments based on the SVM algorithm,” in DOI
Liu Z., Hsiao W., Cantarel B. L., Drábek E. F., Fraser-Liggett C. (2011). Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. PubMed DOI PMC
Liu Y., Meric G., Havulinna A. S., Teo S. M., Ruuskanen M., Sanders J., et al. (2020). Early prediction of liver disease using conventional risk factors and gut microbiome-augmented gradient boosting. PubMed DOI PMC
Lo C., Marculescu R. (2019). MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. PubMed DOI PMC
Lopez Pinaya W. H., Vieira S., Garcia-Dias R., Mechelli A. (2020). “Convolutional neural networks,” in DOI
Lozupone C., Lladser M. E., Knights D., Stombaugh J., Knight R. (2011). UniFrac: an effective distance metric for microbial community comparison. PubMed DOI PMC
Lugo-Martinez J., Ruiz-Perez D., Narasimhan G., Bar-Joseph Z. (2019). Dynamic interaction network inference from longitudinal microbiome data. PubMed DOI PMC
Madeira S. C., Oliveira A. L. (2004). Biclustering algorithms for biological data analysis: a survey. PubMed DOI
McDonald D., Hyde E., Debelius J. W., Morton J. T., Gonzalez A., Ackermann G., et al. (2018). American gut: an open platform for citizen science microbiome research. PubMed DOI PMC
Mitchell A. L., Almeida A., Beracochea M., Boland M., Burgin J., Cochrane G., et al. (2020). MGnify: the microbiome analysis resource in 2020. PubMed DOI PMC
Mitchell A. L., Scheremetjew M., Denise H., Potter S., Tarkowska A., Qureshi M., et al. (2018). EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. PubMed DOI PMC
Mohammed A., Guda C. (2015). Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism. PubMed DOI PMC
Moher D., Liberati A., Tetzlaff J., Altman D. G., and PRISMA Group (2010). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PubMed DOI
Moher D., Stewart L., Shekelle P. (2015). All in the family: systematic reviews, rapid reviews, scoping reviews, realist reviews, and more. PubMed DOI PMC
Moreno-Indias I., Lahti L., Nedyalkova M., Elbere I., Roshchupkin G., Adilovic M., et al. (2021). Statistical and machine learning techniques in human microbiome studies: contemporary challenges and solutions. PubMed DOI PMC
Nielsen H. B., Almeida M., Juncker A. S., Rasmussen S., Li J., Sunagawa S., et al. (2014). Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. PubMed DOI
Ning J., Beiko R. G. (2015). Phylogenetic approaches to microbial community classification. PubMed DOI PMC
Noguchi H., Park J., Takagi T. (2006). MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. PubMed DOI PMC
Oh M., Zhang L. (2020). DeepMicro: deep representation learning for disease prediction based on microbiome data. PubMed DOI PMC
Oudah M., Henschel A. (2018). Taxonomy-aware feature engineering for microbiome classification. PubMed DOI PMC
Papoutsoglou G., Athineou G., Lagani V., Xanthopoulos I., Schmidt A., Éliás S., et al. (2017). SCENERY: a web application for (causal) network reconstruction from cytometry data. PubMed DOI PMC
Pascal V., Pozuelo M., Borruel N., Casellas F., Campos D., Santiago A., et al. (2017). A microbial signature for Crohn’s disease. PubMed DOI PMC
Pasolli E., Schiffer L., Manghi P., Renson A., Obenchain V., Truong D. T., et al. (2017). Accessible, curated metagenomic data through ExperimentHub. PubMed DOI PMC
Pasolli E., Truong D. T., Malik F., Waldron L., Segata N. (2016). Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PubMed DOI PMC
Pawlowsky-Glahn V., Egozcue J. J., Tolosana-Delgado R. (2015).
Pereira P., Aho V., Arola J., Boyd S., Jokelainen K., Paulin L., et al. (2017). Bile microbiota in primary sclerosing cholangitis: impact on disease progression and development of biliary dysplasia. PubMed DOI PMC
Petersen C., Round J. L. (2014). Defining dysbiosis and its influence on host immunity and disease. PubMed DOI PMC
Platt J. C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report MSR-TR-98-14, Microsoft Research.
Plaza Oñate F., Le Chatelier E., Almeida M., Cervino A. C. L., Gauthier F., Magoulès F., et al. (2019). MSPminer: abundance-based reconstitution of microbial pan-genomes from shotgun metagenomic data. PubMed DOI PMC
Purcell R. V., Visnovska M., Biggs P. J., Schmeier S., Frizelle F. A. (2017). Distinct gut microbiome patterns associate with consensus molecular subtypes of colorectal cancer. PubMed DOI PMC
Qin J., Li R., Raes J., Arumugam M., Burgdorf K. S., Manichanh C., et al. (2010). A human gut microbial gene catalogue established by metagenomic sequencing. PubMed DOI PMC
Quinn T. P., Erb I. (2020). Interpretable log contrasts for the classification of health biomarkers: a new approach to balance selection. PubMed DOI PMC
Quinn T. P., Erb I., Richardson M. F., Crowley T. M. (2018). Understanding sequencing data as compositions: an outlook and review. PubMed DOI PMC
Rahman S. F., Olm M. R., Morowitz M. J., Banfield J. F. (2017). Machine learning leveraging genomes from metagenomes identifies influential antibiotic resistance genes in the infant gut microbiome. PubMed DOI PMC
Randolph T. W., Zhao S., Copeland W., Hullar M., Shojaie A. (2018). Kernel-penalized regression for analysis of microbiome data. PubMed DOI PMC
Richards A. L., Muehlbauer A. L., Alazizi A., Burns M. B., Findley A., Messina F., et al. (2019). Gut microbiota has a widespread and modifiable effect on host gene regulation. PubMed DOI PMC
Riley P. (2019). Three pitfalls to avoid in machine learning. PubMed DOI
Rivera-Pinto J., Egozcue J. J., Pawlowsky-Glahn V., Paredes R., Noguera-Julian M., Calle M. L. (2018). Balances: a new perspective for microbiome analysis. PubMed DOI PMC
Roguet A., Eren A. M., Newton R. J., McLellan S. L. (2018). Fecal source identification using random forest. PubMed DOI PMC
Ross A. A., Doxey A. C., Neufeld J. D. (2017). The skin microbiome of cohabiting couples. PubMed DOI PMC
Ross M. C., Muzny D. M., McCormick J. B., Gibbs R. A., Fisher-Hoch S. P., Petrosino J. F. (2015). 16S gut community of the cameron county hispanic cohort. PubMed DOI PMC
Russell S. J., Norvig P. (2016).
Ruuskanen M. O., Åberg F., Männistö V., Havulinna A. S., Méric G., Liu Y., et al. (2020). Links between gut microbiome composition and fatty liver disease in a large population sample. PubMed DOI PMC
Ryan F. J., Ahern A. M., Fitzgerald R. S., Laserna-Mendieta E. J., Power E. M., Clooney A. G., et al. (2020). Colonic microbiota is associated with inflammation and host epigenomic alterations in inflammatory bowel disease. PubMed DOI PMC
Sanna S., van Zuydam N. R., Mahajan A., Kurilshikov A., Vich Vila A., Võsa U., et al. (2019). Causal relationships among the gut microbiome, short-chain fatty acids and metabolic diseases. PubMed DOI PMC
Saulnier D. M., Riehle K., Mistretta T.-A., Diaz M.-A., Mandal D., Raza S., et al. (2011). Gastrointestinal microbiome signatures of pediatric patients with irritable bowel syndrome. PubMed DOI PMC
Scholz M., Ward D. V., Pasolli E., Tolio T., Zolfo M., Asnicar F., et al. (2016). Strain-level microbial epidemiology and population genomics from shotgun metagenomics. PubMed DOI
Schubert A. M., Rogers M. A. M., Ring C., Mogle J., Petrosino J. P., Young V. B., et al. (2014). Microbiome data distinguish patients with PubMed DOI PMC
Segata N., Izard J., Waldron L., Gevers D., Miropolsky L., Garrett W. S., et al. (2011). Metagenomic biomarker discovery and explanation. PubMed DOI PMC
Segata N., Waldron L., Ballarini A., Narasimhan V., Jousson O., Huttenhower C. (2012). Metagenomic microbial community profiling using unique clade-specific marker genes. PubMed DOI PMC
Seo M., Heo J., Yoon J., Kim S.-Y., Kang Y.-M., Yu J., et al. (2017). PubMed DOI PMC
Silverman J. D., Washburne A. D., Mukherjee S., David L. A. (2017). A phylogenetic transform enhances analysis of compositional microbiota data. PubMed DOI PMC
Sokol H., Leducq V., Aschard H., Pham H.-P., Jegou S., Landman C., et al. (2017). Fungal microbiota dysbiosis in IBD. PubMed DOI PMC
Stamatakis A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. PubMed DOI PMC
Statnikov A., Henaff M., Narendra V., Konganti K., Li Z., Yang L., et al. (2013). A comprehensive evaluation of multicategory classification methods for microbiomic data. PubMed DOI PMC
Sze M. A., Schloss P. D. (2016). Looking for a signal in the noise: revisiting obesity and the microbiome. PubMed DOI PMC
Tap J., Derrien M., Tornblom H., Brazeilles R., Cools-Portier S., Dore J., et al. (2017). Identification of an intestinal microbiota signature associated with severity of irritable bowel syndrome. PubMed DOI
Telalovic H. J., Azra M. (2020). Using data science for medical decision making case: role of gut microbiome in multiple sclerosis. PubMed DOI PMC
Thomas A. M., Manghi P., Asnicar F., Pasolli E., Armanini F., Zolfo M., et al. (2019). Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. PubMed DOI PMC
Travisany D., Galarce D., Maass A., Assar R. (2015). “predicting the metagenomics content with multiple CART trees,” in DOI
Truong D. T., Franzosa E. A., Tickle T. L., Scholz M., Weingart G., Pasolli E., et al. (2015). MetaPhlAn2 for enhanced metagenomic taxonomic profiling. PubMed DOI
Tsamardinos I., Charonyktakis P., Lakiotaki K., Borboudakis G., Zenklusen J. C., Juhl H., et al. (2020). Just add data: automated predictive modeling and biosignature discovery. PubMed DOI PMC
Tsamardinos I., Greasidou E., Borboudakis G. (2018). Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation. PubMed DOI PMC
Tsamardinos I., Rakhshani A., Lagani V. (2015). Performance-estimation properties of cross-validationbased protocols with simultaneous hyper-parameter optimization. DOI
Turnbaugh P. J., Ley R. E., Hamady M., Fraser-Liggett C. M., Knight R., Gordon J. I. (2007). The human microbiome project. PubMed DOI PMC
Turnbaugh P. J., Ley R. E., Mahowald M. A., Magrini V., Mardis E. R., Gordon J. I. (2006). An obesity-associated gut microbiome with increased capacity for energy harvest. PubMed DOI
Turnbaugh P. J., Ridaura V. K., Faith J. J., Rey F. E., Knight R., Gordon J. I. (2009). The effect of diet on the human gut microbiome: a metagenomic analysis in humanized gnotobiotic mice. PubMed DOI PMC
Vangay P., Hillmann B. M., Knights D. (2019). Microbiome Learning Repo (ML Repo): a public repository of microbiome regression and classification tasks. PubMed DOI PMC
Vatanen T., Franzosa E. A., Schwager R., Tripathi S., Arthur T. D., Vehik K., et al. (2018). The human gut microbiome in early-onset type 1 diabetes from the TEDDY study. PubMed DOI PMC
Vervier K., Mahé P., Tournoud M., Veyrieras J.-B., Vert J.-P. (2016). Large-scale machine learning for metagenomics sequence classification. PubMed DOI PMC
Wang H., Marcišauskas S., Sánchez B. J., Domenzain I., Hermansson D., Agren R., et al. (2018). RAVEN 2.0: a versatile toolbox for metabolic network reconstruction and a case study on PubMed DOI PMC
Wassan J. T., Wang H., Browne F., Zheng H. (2018a). A comprehensive study on predicting functional role of metagenomes using machine learning methods. PubMed DOI
Wassan J. T., Wang H., Browne F., Zheng H. (2018b). “PAAM-ML: a novel phylogeny and abundance aware machine learning modelling approach for microbiome classification,” in DOI
Wassan J. T., Wang H., Browne F., Zheng H. (2019). Phy-PMRFI: phylogeny-aware prediction of metagenomic functions using random forest feature importance. PubMed DOI
Weiss S., Xu Z. Z., Peddada S., Amir A., Bittinger K., Gonzalez A., et al. (2017). Normalization and microbial differential abundance strategies depend upon data characteristics. PubMed DOI PMC
Wen C., Zheng Z., Shao T., Liu L., Xie Z., Le Chatelier E., et al. (2017). Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis. PubMed DOI PMC
Werner J. J., Koren O., Hugenholtz P., DeSantis T. Z., Walters W. A., Caporaso J. G., et al. (2012). Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys. PubMed DOI PMC
Winand R., Bogaerts B., Hoffman S., Lefevre L., Delvoye M., Van Braekel J., et al. (2020). Targeting the 16s rRNA gene for bacterial identification in complex mixed samples: Comparative evaluation of second (Illumina) and third (oxford nanopore technologies) generation sequencing technologies. PubMed PMC
Wingfield B., Coleman S., McGinnity T. M., Bjourson A. J. (2016). “A metagenomic hybrid classifier for paediatric inflammatory bowel disease,” in DOI
Wirbel J., Pyl P. T., Kartal E., Zych K., Kashani A., Milanese A., et al. (2019). Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. PubMed DOI PMC
Wu C., Chen J., Kim J., Pan W. (2016). An adaptive association test for microbiome data. PubMed DOI PMC
Wu G. D., Chen J., Hoffmann C., Bittinger K., Chen Y.-Y., Keilbaugh S. A., et al. (2011). Linking long-term dietary patterns with gut microbial enterotypes. PubMed DOI PMC
Wu H., Cai L., Li D., Wang X., Zhao S., Zou F., et al. (2018). Metagenomics biomarkers selected for prediction of three different diseases in chinese population. PubMed DOI PMC
Xia L. C., Cram J. A., Chen T., Fuhrman J. A., Sun F. (2011). Accurate genome relative abundance estimation based on shotgun metagenomic reads. PubMed DOI PMC
Xie J., Ma A., Fennell A., Ma Q., Zhao J. (2019). It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. PubMed DOI PMC
Yachida S., Mizutani S., Shiroma H., Shiba S., Nakajima T., Sakamoto T., et al. (2019). Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. PubMed DOI
Yang J., Tsukimi T., Yoshikawa M., Suzuki K., Takeda T., Tomita M., et al. (2019). PubMed DOI PMC
Yang L., Yachimski P. S., Brodie E., Nelson K. E., Pei Z. (2015). “Foregut microbiome, development of esophageal adenocarcinoma, project,” in DOI
Yarza P., Yilmaz P., Panzer K., Glöckner F. O., Reich M. (2017). A phylogenetic framework for the kingdom Fungi based on 18S rRNA gene sequences. PubMed DOI
Zdravevski E., Lameski P., Trajkovik V., Chorbev I., Goleva R., Pombo N., et al. (2019). “Automation in systematic, scoping and rapid reviews by an NLP toolkit: a case study in enhanced living environments,” in DOI
Zeevi D., Korem T., Zmora N., Israeli D., Rothschild D., Weinberger A., et al. (2015). Personalized nutrition by prediction of glycemic responses. PubMed DOI
Zeller G., Tap J., Voigt A. Y., Sunagawa S., Kultima J. R., Costea P. I., et al. (2014). Potential of fecal microbiota for early-stage detection of colorectal cancer. PubMed DOI PMC
Zhang Z.-Y. (2012). “Nonnegative matrix factorization: models, algorithms and applications,” in DOI
Zhou F., He K., Li Q., Chapkin R. S., Ni Y. (2020). Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization. PubMed PMC
Zhou Y.-H., Gallins P. (2019). A review and tutorial of machine learning methods for microbiome host trait prediction. PubMed DOI PMC
Zhu Q., Li B., He T., Li G., Jiang X. (2020). Robust biomarker discovery for microbiome-wide association studies. PubMed DOI
Zupancic M. L., Cantarel B. L., Liu Z., Drabek E. F., Ryan K. A., Cirimotich S., et al. (2012). Analysis of the gut microbiota in the old order Amish and its relation to the metabolic syndrome. PubMed DOI PMC
Overview of data preprocessing for machine learning applications in human microbiome research
Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action