Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
33692771
PubMed Central
PMC7937616
DOI
10.3389/fmicb.2021.635781
Knihovny.cz E-zdroje
- Klíčová slova
- ML4Microbiome, biomarker identification, machine learning, microbiome, personalized medicine,
- Publikační typ
- časopisecké články MeSH
The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 "ML4Microbiome" that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.
Bioinformatics Research Unit Riga Stradins University Riga Latvia
Biotechnical Faculty University of Ljubljana Ljubljana Slovenia
Centro de Matemática e Aplicações FCT UNL Caparica Portugal
CINTESIS NOVA Medical School NMS Universidade Nova de Lisboa Lisbon Portugal
Computational Oncology Sage Bionetworks Seattle WA United States
Department of Biology University of Fribourg Fribourg Switzerland
Department of Clinical Science University of Bergen Bergen Norway
Department of Computer Engineering Abdullah Gul University Kayseri Turkey
Department of Computer Science University of Bari Aldo Moro Bari Italy
Department of Computer Technologies Karadeniz Technical University Trabzon Turkey
Department of Computing University of Turku Turku Finland
Department of Electrical and Electronics Engineering Karadeniz Technical University Trabzon Turkey
Department of Epidemiology Erasmus Medical Center Rotterdam Netherlands
Department of Infection and Immunity Luxembourg Institute of Health Esch sur Alzette Luxembourg
Department of Microbiology University of Innsbruck Innsbruck Austria
European Molecular Biology Laboratory Structural and Computational Biology Unit Heidelberg Germany
Faculty of Civil and Geodetic Engineering University of Ljubljana Ljubljana Slovenia
Faculty of Information Tehnology and Bionics Pázmány University Budapest Hungary
Faculty of Mathematics and Computer Science Nicolaus Copernicus University Toruñ Poland
Human Genetics and Disease Mechanisms Latvian Biomedical Research and Study Centre Riga Latvia
Institute of Molecular and Cell Biology University of Tartu Tartu Estonia
Jozef Stefan Institute Ljubljana Slovenia
Latvian Biomedical Research and Study Centre Riga Latvia
Metagenomics Laboratory Genome and Stem Cell Center Erciyes University Kayseri Turkey
Navarrabiomed Complejo Hospitalario de Navarra Pamplona Spain
NOVA Laboratory for Computer Science and Informatics FCT UNL Caparica Portugal
School of Microbiology and APC Microbiome Ireland University College Cork Cork Ireland
Zobrazit více v PubMed
Ai L., Tian H., Chen Z., Chen H., Xu J., Fang J.-Y. (2017). Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer. PubMed DOI PMC
Aitchison J. (1986).
Alneberg J., Bjarnason B. S., de Bruijn I., Schirmer M., Quick J., Ijaz U. Z., et al. (2014). Binning metagenomic contigs by coverage and composition. PubMed DOI
Arango-Argoty G., Garner E., Pruden A., Heath L. S., Vikesland P., Zhang L. (2018). DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. PubMed DOI PMC
Arbel J., Mengersen K., Rousseau J. (2016). Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity. DOI
Armour C. R., Nayfach S., Pollard K. S., Sharpton T. J. (2019). A metagenomic meta-analysis reveals functional signatures of health and disease in the human gut microbiome. PubMed DOI PMC
Aryal S., Alimadadi A., Manandhar I., Joe B., Cheng X. (2020). Machine learning strategy for gut microbiome-based diagnostic screening of cardiovascular disease. PubMed DOI PMC
Asgari E., Garakani K., McHardy A. C., Mofrad M. R. K. (2018). MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. PubMed DOI PMC
Barratt M. J., Lebrilla C., Shapiro H.-Y., Gordon J. I. (2017). The gut microbiota, food science, and human nutrition: a timely marriage. PubMed DOI PMC
Becht E., McInnes L., Healy J., Dutertre C.-A., Kwok I. W. H., Ng L. G., et al. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. PubMed DOI
Berg G., Rybakova D., Fischer D., Cernava T., Vergès M.-C. C., Charles T., et al. (2020). Microbiome definition re-visited: old concepts and new challenges. PubMed DOI PMC
Björk J. R., Hui F. K. C., O’Hara R. B., Montoya J. M. (2018). Uncovering the drivers of host-associated microbiota with joint species distribution modelling. PubMed DOI PMC
Bolyen E., Rideout J. R., Dillon M. R., Bokulich N. A., Abnet C. C., Al-Ghalith G. A., et al. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. PubMed DOI PMC
Buffie C. G., Pamer E. G. (2013). Microbiota-mediated colonization resistance against intestinal pathogens. PubMed DOI PMC
Buza T. M., Tonui T., Stomeo F., Tiambo C., Katani R., Schilling M., et al. (2019). iMAP: an integrated bioinformatics and visualization pipeline for microbiome data analysis. PubMed DOI PMC
Callahan B. J., McMurdie P. J., Rosen M. J., Han A. W., Johnson A. J. A., Holmes S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. PubMed DOI PMC
Chong J., Liu P., Zhou G., Xia J. (2020). Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data. PubMed DOI
Costea P. I., Hildebrand F., Arumugam M., Bäckhed F., Blaser M. J., Bushman F. D., et al. (2018). Enterotypes in the landscape of gut microbial community composition. PubMed DOI PMC
Cullen C. M., Aneja K. K., Beyhan S., Cho C. E., Woloszynek S., Convertino M., et al. (2020). Emerging priorities for microbiome research. PubMed DOI PMC
Davis N. M., Proctor D. M., Holmes S. P., Relman D. A., Callahan B. J. (2018). Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. PubMed DOI PMC
Díez López C., Vidaki A., Ralf A., Montiel González D., Radjabzadeh D., Kraaij R., et al. (2019). Novel taxonomy-independent deep learning microbiome approach allows for accurate classification of different forensically relevant human epithelial materials. PubMed DOI
Eetemadi A., Rai N., Pereira B. M. P., Kim M., Schmitz H., Tagkopoulos I. (2020). The computational diet: a review of computational methods across diet, microbiome, and health. PubMed DOI PMC
Eren A. M., Esen ÖC., Quince C., Vineis J. H., Morrison H. G., Sogin M. L., et al. (2015). Anvi’o: an advanced analysis and visualization platform for ‘omics data. PubMed DOI PMC
Falony G., Joossens M., Vieira-Silva S., Wang J., Darzi Y., Faust K., et al. (2016). Population-level analysis of gut microbiome variation. PubMed DOI
Fernandes A. D., Reid J. N., Macklaim J. M., McMurrough T. A., Edgell D. R., Gloor G. B. (2014). Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. PubMed DOI PMC
Gagnière J., Raisch J., Veziant J., Barnich N., Bonnet R., Buc E., et al. (2016). Gut microbiota imbalance and colorectal cancer. PubMed DOI PMC
Gloor G. B., Macklaim J. M., Pawlowsky-Glahn V., Egozcue J. J. (2017). Microbiome datasets are compositional: and this is not optional. PubMed DOI PMC
Gómez-López G., Dopazo J., Cigudosa J. C., Valencia A., Al-Shahrour F. (2019). Precision medicine needs pioneering clinical bioinformaticians. PubMed DOI
Hillmann B., Al-Ghalith G. A., Shields-Cutler R. R., Zhu Q., Gohl D. M., Beckman K. B., et al. (2018). Evaluating the information content of shallow shotgun metagenomics. PubMed DOI PMC
Holmes I., Harris K., Quince C. (2012). Dirichlet multinomial mixtures: generative models for microbial metagenomics. PubMed DOI PMC
Huang R., Soneson C., Ernst F. G. M., Rue-Albrecht K. C., Yu G., Hicks S. C., et al. (2020). TreeSummarizedExperiment: a S4 class for data with hierarchical structure. PubMed DOI PMC
Hughes D. A., Bacigalupe R., Wang J., Rühlemann M. C., Tito R. Y., Falony G., et al. (2020). Genome-wide associations of human gut microbiome variation and implications for causal inference analyses. PubMed DOI PMC
Juhász J., Kertész-Farkas A., Szabó D., Pongor S. (2014). Emergence of collective territorial defense in bacterial communities: horizontal gene transfer can stabilize microbiomes. PubMed DOI PMC
Kim S., Covington A., Pamer E. G. (2017). The intestinal microbiota: antibiotics, colonization resistance, and enteric pathogens. PubMed DOI PMC
Knight R., Vrbanac A., Taylor B. C., Aksenov A., Callewaert C., Debelius J., et al. (2018). Best practices for analysing microbiomes. PubMed DOI
Knights D., Kuczynski J., Charlson E. S., Zaneveld J., Mozer M. C., Collman R. G., et al. (2011). Bayesian community-wide culture-independent microbial source tracking. PubMed DOI PMC
Kobak D., Berens P. (2019). The art of using t-SNE for single-cell transcriptomics. PubMed DOI PMC
Lahti L., Salojärvi J., Salonen A., Scheffer M., de Vos W. M. (2014). Tipping elements in the human intestinal ecosystem. PubMed DOI PMC
LaPierre N., Ju C. J.-T., Zhou G., Wang W. (2019). MetaPheno: a critical evaluation of deep learning and machine learning in metagenome-based disease prediction. PubMed DOI PMC
Lederberg J., McCray A. T. (2001). ‘Ome sweet ‘omics– a genealogical treasury of words. DOI
Legendre P., Legendre L. (2012).
Liao T., Wei Y., Luo M., Zhao G.-P., Zhou H. (2019). tmap: an integrative framework based on topological data analysis for population-scale microbiome stratification and association studies. PubMed DOI PMC
Lin C., Culver J., Weston B., Underhill E., Gorky J., Dhurjati P. (2018). GutLogo: agent-based modeling framework to investigate spatial and temporal dynamics in the gut microbiome. PubMed DOI PMC
Lin H., Peddada S. D. (2020). Analysis of compositions of microbiomes with bias correction. PubMed DOI PMC
Liu Y., Meric G., Havulinna A. S., Teo S. M., Ruuskanen M., Sanders J., et al. (2020). Early prediction of liver disease using conventional risk factors and gut microbiome-augmented gradient boosting. PubMed DOI PMC
Love M. I., Huber W., Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. PubMed DOI PMC
Lozupone C. A., Stombaugh J., Gonzalez A., Ackermann G., Wendel D., Vázquez-Baeza Y., et al. (2013). Meta-analyses of studies of the human microbiota. PubMed DOI PMC
Lynch S. V., Ng S. C., Shanahan F., Tilg H. (2019). Translating the gut microbiome: ready for the clinic? PubMed DOI
Malla M. A., Dubey A., Kumar A., Yadav S., Hashem A., Abd_Allah E. F. (2019). Exploring the human microbiome: the potential future role of next-generation sequencing in disease diagnosis and treatment. PubMed DOI PMC
Marcos-Zambrano L. J., Karaduzovic-Hadziabdic K., Przymus P., Trajkovik V., Aasmets O., Berland M., et al. (2021). Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment. PubMed DOI PMC
McGhee J. J., Rawson N., Bailey B. A., Fernandez-Guerra A., Sisk-Hackworth L., Kelley S. T. (2020). Meta-SourceTracker: application of Bayesian source tracking to shotgun metagenomics. PubMed DOI PMC
McIver L. J., Abu-Ali G., Franzosa E. A., Schwager R., Morgan X. C., Waldron L., et al. (2018). bioBakery: a meta’omic analysis environment. PubMed DOI PMC
McMurdie P. J., Holmes S. (2013). phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PubMed DOI PMC
Mehta R. S., Abu-Ali G. S., Drew D. A., Lloyd-Price J., Subramanian A., Lochhead P., et al. (2018). Stability of the human faecal microbiome in a cohort of adult men. PubMed DOI PMC
Meyer F., Paarmann D., D’Souza M., Olson R., Glass E., Kubal M., et al. (2008). The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. PubMed DOI PMC
Mitchell A. L., Almeida A., Beracochea M., Boland M., Burgin J., Cochrane G., et al. (2020). MGnify: the microbiome analysis resource in 2020. PubMed DOI PMC
Murovec B., Deutsch L., Stres B. (2020). Computational framework for high-quality production and large-scale evolutionary analysis of metagenome assembled genomes. PubMed DOI PMC
Namkung J. (2020). Machine learning methods for microbiome studies. PubMed DOI
Nayfach S., Shi Z. J., Seshadri R., Pollard K. S., Kyrpides N. C. (2019). New insights from uncultivated genomes of the global human gut microbiome. PubMed DOI PMC
Oh M., Zhang L. (2020). DeepMicro: deep representation learning for disease prediction based on microbiome data. PubMed DOI PMC
Olson R. S., La Cava W., Orzechowski P., Urbanowicz R. J., Moore J. H. (2017). PMLB: a large benchmark suite for machine learning evaluation and comparison. PubMed DOI PMC
Org E., Parks B. W., Joo J. W. J., Emert B., Schwartzman W., Kang E. Y., et al. (2015). Genetic and environmental control of host-gut microbiota interactions. PubMed DOI PMC
Pasolli E., Truong D. T., Malik F., Waldron L., Segata N. (2016). Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights. PubMed DOI PMC
Pearl J. (2009). Causal inference in statistics: an overview. DOI
Poussin C., Sierro N., Boué S., Battey J., Scotti E., Belcastro V., et al. (2018). Interrogating the microbiome: experimental and computational considerations in support of study reproducibility. PubMed DOI
Qin J., Li R., Raes J., Arumugam M., Burgdorf K. S., Manichanh C., et al. (2010). A human gut microbial gene catalog established by metagenomic sequencing. PubMed DOI PMC
Qin Y., Meric G., Long T., Watrous J., Burgess S., Havulinna A., et al. (2020). Genome-wide association and Mendelian randomization analysis prioritizes bioactive metabolites with putative causal effects on common diseases. DOI
Quince C., Walker A. W., Simpson J. T., Loman N. J., Segata N. (2017). Shotgun metagenomics, from sampling to analysis. PubMed DOI
Rahman M. A., Rangwala H. (2020). IDMIL: an alignment-free interpretable deep multiple instance learning (MIL) for predicting disease from whole-metagenomic data. PubMed DOI PMC
Rahman S. F., Olm M. R., Morowitz M. J., Banfield J. F. (2018). Machine learning leveraging genomes from metagenomes identifies influential antibiotic resistance genes in the infant gut microbiome. PubMed DOI PMC
Reiman D., Metwally A. A., Dai Y. (2018). PopPhy-CNN: a phylogenetic tree embedded architecture for convolution neural networks for metagenomic data. PubMed DOI
Roslund M. I., Puhakka R., Gr nroos N., Nurminen N., Oikarinen N., Gazal A. M. (2020). Biodiversity intervention enhances immune regulation and health-associated commensal microbiota among daycare children. PubMed DOI PMC
Rule A., Birmingham A., Zuniga C., Altintas I., Huang S.-C., Knight R., et al. (2019). Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PubMed DOI PMC
Saez-Rodriguez J., Costello J. C., Friend S. H., Kellen M. R., Mangravite L., Meyer P., et al. (2016). Crowdsourcing biomedical research: leveraging communities as innovation engines. PubMed DOI PMC
Salosensaari A., Laitinen V., Havulinna A. S., Meric G., Cheng S., Perola M., et al. (2020). Taxonomic signatures of long-term mortality risk in human gut microbiota. PubMed DOI PMC
Sampson T. R., Debelius J. W., Thron T., Janssen S., Shastri G. G., Ilhan Z. E., et al. (2016). Gut microbiota regulate motor deficits and neuroinflammation in a model of Parkinson’s disease. PubMed DOI PMC
Sankaran K., Holmes S. (2014). structSSI: simultaneous and selective inference for grouped or hierarchically structured data. PubMed DOI PMC
Sankaran K., Holmes S. P. (2019). Latent variable modeling for the microbiome. PubMed DOI PMC
Sanna S., van Zuydam N. R., Mahajan A., Kurilshikov A., Vich Vila A., Võsa U., et al. (2019). Causal relationships among the gut microbiome, short-chain fatty acids and metabolic diseases. PubMed DOI PMC
Schmidt T. S. B., Raes J., Bork P. (2018). The human gut microbiome: from association to modulation. PubMed DOI
Schmitt S., Tsai P., Bell J., Fromont J., Ilan M., Lindquist N., et al. (2012). Assessing the complex sponge microbiota: core, variable and species-specific bacterial communities in marine sponges. PubMed DOI PMC
Schloss P. D., Westcott S. L., Ryabin T., Hall J. R., Hartmann M., Hollister E. B., et al. (2009). Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. PubMed DOI PMC
Segata N., Izard J., Waldron L., Gevers D., Miropolsky L., Garrett W. S., et al. (2011). Metagenomic biomarker discovery and explanation. PubMed DOI PMC
Shenhav L., Thompson M., Joseph T. A., Briscoe L., Furman O., Bogumil D., et al. (2019). FEAST: fast expectation-maximization for microbial source tracking. PubMed DOI PMC
Shetty S. A., Lahti L. (2019). Microbiome data science. PubMed
Singh R. K., Chang H.-W., Yan D., Lee K. M., Ucmak D., Wong K., et al. (2017). Influence of diet on the gut microbiome and implications for human health. PubMed DOI PMC
Sze M. A., Schloss P. D. (2018). Leveraging existing 16S rRNA gene surveys to identify reproducible biomarkers in individuals with colorectal tumors. PubMed DOI PMC
Tamames J., Cobo-Simón M., Puente-Sánchez F. (2019). Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes. PubMed DOI PMC
Tamburini S., Shen N., Wu H. C., Clemente J. C. (2016). The microbiome in early life: implications for health outcomes. PubMed DOI
ten Hoopen P., Finn R. D., Bongo L. A., Corre E., Fosso B., Meyer F., et al. (2017). The metagenomic data life-cycle: standards and best practices. PubMed DOI PMC
Topçuoğlu B. D., Lesniak N. A., Ruffin M. T., Wiens J., Schloss P. D. (2020). A framework for effective application of machine learning to microbiome-based classification problems. PubMed DOI PMC
Treangen T. J., Koren S., Sommer D. D., Liu B., Astrovskaya I., Ondov B., et al. (2013). MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. PubMed DOI PMC
Turnbaugh P. J., Ley R. E., Hamady M., Fraser-Liggett C. M., Knight R., Gordon J. I. (2007). The human microbiome project. PubMed DOI PMC
Walhout M., Vidal M., Dekker J. (2013).
Wang Y., Kasper L. H. (2014). The role of microbiome in central nervous system disorders. PubMed DOI PMC
Washburne A. D., Silverman J. D., Leff J. W., Bennett D. J., Darcy J. L., Mukherjee S., et al. (2017). Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PubMed DOI PMC
Washburne A. D., Silverman J. D., Morton J. T., Becker D. J., Crowley D., Mukherjee S., et al. (2019). Phylofactorization: a graph partitioning algorithm to identify phylogenetic scales of ecological data. DOI
Weiss S., Xu Z. Z., Peddada S., Amir A., Bittinger K., Gonzalez A., et al. (2017). Normalization and microbial differential abundance strategies depend upon data characteristics. PubMed DOI PMC
Zeevi D., Korem T., Godneva A., Bar N., Kurilshikov A., Lotan-Pompan M., et al. (2019). Structural variation in the gut microbiome associates with host health. PubMed DOI
Zhernakova A., Kurilshikov A., Bonder M. J., Tigchelaar E. F., Schirmer M., Vatanen T., et al. (2016). Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. PubMed DOI PMC
Overview of data preprocessing for machine learning applications in human microbiome research
Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action