Overview of data preprocessing for machine learning applications in human microbiome research
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection
Typ dokumentu časopisecké články, přehledy
PubMed
37869650
PubMed Central
PMC10588656
DOI
10.3389/fmicb.2023.1250909
Knihovny.cz E-zdroje
- Klíčová slova
- compositionality, data preprocessing, human microbiome, machine learning, metagenomics data, normalization,
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.
BioSense Institute University of Novi Sad Novi Sad Serbia
Department of Animal Science Biotechnical Faculty University of Ljubljana Ljubljana Slovenia
Department of Applied Mathematics Faculty of Natural Sciences University of Tirana Tirana Albania
Department of Automation Biocybernetics and Robotics Jožef Stefan Institute Ljubljana Slovenia
Department of Biology Faculty of Natural Sciences University of Tirana Tirana Albania
Department of Clinical Science University of Bergen Bergen Norway
Faculty of Civil and Geodetic Engineering Institute of Sanitary Engineering Ljubljana Slovenia
INRAE MetaGenoPolis Université Paris Saclay Jouy en Josas France
Zobrazit více v PubMed
Adade E. E., Al Lakhen K., Lemus A. A., Valm A. M. (2021). Recent progress in analyzing the spatial structure of the human microbiome: Distinguishing biogeography and architecture in the oral and gut communities. Curr. Opin. Endocr. Metab. Res. 18, 275–283. doi: 10.1016/j.coemr.2021.04.005, PMID: PubMed DOI PMC
Aitchison J. (1982). The statistical analysis of compositional data (with discussion). J R Stat Soc Series B. 44, 139–177.
Aitchison J. (1986). The statistical analysis of compositional data. London: Chapman & Hall.
Amir A., McDonald D., Navas-Molina J. A., Kopylova E., Morton J. T., Zech Xu Z., et al. . (2017). Deblur rapidly resolves single-nucleotide community sequence patterns. MSystems 2:e00191-16. doi: 10.1128/mSystems.00191-16, PMID: PubMed DOI PMC
Arksey H., O’Malley L. (2005). Scoping studies: towards a methodological framework. Int. J. Soc. Res. Methodol. 8, 19–32. doi: 10.1080/1364557032000119616 DOI
Baksi K. D., Kuntal B. K., Mande S. S. (2018). ‘TIME’: a web application for obtaining insights into microbial ecology using longitudinal microbiome data. Front. Microbiol. 9:36. doi: 10.3389/fmicb.2018.00036, PMID: PubMed DOI PMC
Beghini F., McIver L. J., Blanco-Míguez A., Dubois L., Asnicar F., Maharjan S., et al. . (2021). Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. elife 10:e65088. doi: 10.7554/eLife.65088, PMID: PubMed DOI PMC
Blanco-Míguez A., Beghini F., Cumbo F., McIver L. J., Thompson K. N., Zolfo M., et al. . (2023). Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 1–12. doi: 10.1038/s41587-023-01688-w, PMID: PubMed DOI PMC
Bogart E., Creswell R., Gerber G. K. (2019). MITRE: inferring features from microbiota time-series data linked to host status. Genome Biol. 20:186. doi: 10.1186/s13059-019-1788-y, PMID: PubMed DOI PMC
Bokulich N. A., Subramanian S., Faith J. J., Gevers D., Gordon J. I., Knight R., et al. . (2013). Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nat. Methods 10, 57–59. doi: 10.1038/nmeth.2276, PMID: PubMed DOI PMC
Bolger A. M., Lohse M., Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. doi: 10.1093/bioinformatics/btu170, PMID: PubMed DOI PMC
Bushnell B., Rood J., Singer E. (2017). BBMerge – Accurate paired shotgun read merging via overlap. PLoS One 12:e0185056. doi: 10.1371/journal.pone.0185056, PMID: PubMed DOI PMC
Callahan B. J., McMurdie P. J., Rosen M. J., Han A. W., Johnson A. J., Holmes S. P. (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583. doi: 10.1038/nmeth.3869 PubMed DOI PMC
Chen L., Reeve J., Zhang L., Huang S., Wang X., Chen J. (2018). GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data. PeerJ 6:e4600. doi: 10.7717/peerj.4600, PMID: PubMed DOI PMC
Chen Y., Wu T., Lu W., Yuan W., Pan M., Lee Y.-K., et al. . (2021). Predicting the role of the human gut microbiome in constipation using machine-learning methods: a meta-analysis. Microorganisms 9:2149. doi: 10.3390/microorganisms9102149, PMID: PubMed DOI PMC
Clarotto L., Allard D., Menafoglio A. (2022). A new class of α-transformations for the spatial analysis of compositional data. Spat. Stat. 47:100570. doi: 10.1016/j.spasta.2021.100570 DOI
Costea P. I., Zeller G., Sunagawa S., Bork P. (2014). A fair comparison. Nat. Methods 11:359. doi: 10.1038/nmeth.2897 PubMed DOI
D’Elia D., Truu J., Lahti L., Berland M., Papoutsoglou G., Ceci M., et al. . (2023). Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action. Front. Microbiol. 14:1257002. doi: 10.3389/fmicb.2023.1257002 PubMed DOI PMC
Dhungel E., Mreyoud Y., Gwak H.-J., Rajeh A., Rho M., Ahn T.-H. (2021). MegaR: an interactive R package for rapid sample classification and phenotype prediction using metagenome profiles and machine learning. BMC Bioinformatics 22:25. doi: 10.1186/s12859-020-03933-4, PMID: PubMed DOI PMC
Eck A., Zintgraf L. M., de Groot E. F. J., de Meij T. G. J., Cohen T. S., Savelkoul P. H. M., et al. . (2017). Interpretation of microbiota-based diagnostics by explaining individual classifier decisions. BMC Bioinformatics 18:441. doi: 10.1186/s12859-017-1843-1, PMID: PubMed DOI PMC
Edgar R. C., Haas B. J., Clemente J. C., Quince C., Knight R. (2011). UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27, 2194–2200. doi: 10.1093/bioinformatics/btr381, PMID: PubMed DOI PMC
Egozcue J. J., Pawlowsky-Glahn V. (2005). Groups of parts and their balances in compositional data analysis. Math. Geol. 37, 795–828. doi: 10.1007/s11004-005-7381-9 DOI
Egozcue J. J., Pawlowsky-Glahn V., Mateu-Figueras G., Barceló-Vidal C. (2003). Isometric logratio transformations for compositional data analysis. Math. Geol. 35, 279–300. doi: 10.1023/A:1023818214614 DOI
Fabijanić M., Vlahoviček K. (2016). Big data, evolution, and metagenomes: predicting disease from gut microbiota codon usage profiles. Methods Mol. Biol. 1415, 509–531. doi: 10.1007/978-1-4939-3572-7_26 PubMed DOI
Fernández-Edreira D., Liñares-Blanco J., Fernandez-Lozano C. (2021). Machine Learning analysis of the human infant gut microbiome identifies influential species in type 1 diabetes. Expert Syst. Appl. 185:115648. doi: 10.1016/j.eswa.2021.115648 DOI
Filzmoser P., Hron K., Templ M. (2018). Applied compositional data analysis. Cham: Springer International Publishing.
Filzmoser P., Walczak B. (2014). What can go wrong at the data normalization step for identification of biomarkers? J. Chromatogr. A 1362, 194–205. doi: 10.1016/j.chroma.2014.08.050, PMID: PubMed DOI
Flemer B., Warren R. D., Barrett M. P., Cisek K., Das A., Jeffery I. B., et al. . (2018). The oral microbiota in colorectal cancer is distinctive and predictive. Gut 67, 1454–1463. doi: 10.1136/gutjnl-2017-314814, PMID: PubMed DOI PMC
Fouladi F., Carroll I. M., Sharpton T. J., Bulik-Sullivan E., Heinberg L., Steffen K. J., et al. . (2021). A microbial signature following bariatric surgery is robustly consistent across multiple cohorts. Gut Microbes 13:1930872. doi: 10.1080/19490976.2021.1930872, PMID: PubMed DOI PMC
Fukui H., Nishida A., Matsuda S., Kira F., Watanabe S., Kuriyama M., et al. . (2020). Usefulness of machine learning-based gut microbiome analysis for identifying patients with irritable bowels syndrome. J. Clin. Med. 9:2403. doi: 10.3390/jcm9082403, PMID: PubMed DOI PMC
Galkin F., Mamoshina P., Aliper A., Putin E., Moskalev V., Gladyshev V. N., et al. . (2020). Human gut microbiome aging clock based on taxonomic profiling and deep learning. IScience 23:101199. doi: 10.1016/j.isci.2020.101199, PMID: PubMed DOI PMC
Gloor G. B., Wu J. R., Pawlowsky-Glahn V., Egozcue J. J. (2016). It’s all relative: analyzing microbiome data as compositions. Ann. Epidemiol. 26, 322–329. doi: 10.1016/j.annepidem.2016.03.003, PMID: PubMed DOI
Greenacre M. (2010). Log-ratio analysis is a limiting case of correspondence analysis. Math. Geosci. 42, 129–134. doi: 10.1007/s11004-008-9212-2 DOI
Greenacre M. (2011). Measuring subcompositional incoherence. Math. Geosci. 43, 681–693. doi: 10.1007/s11004-011-9338-5 DOI
Greenacre M., Martínez-Álvaro M., Blasco A. (2021). Compositional data analysis of microbiome and any-omics datasets: a validation of the additive logratio transformation. Front. Microbiol. 12:727398. doi: 10.3389/fmicb.2021.727398, PMID: PubMed DOI PMC
Gupta A., Dhakan D. B., Maji A., Saxena R., P K V. P., Mahajan S., et al. . (2019). Association of Flavonifractor plautii, a flavonoid-degrading bacterium, with the gut microbiome of colorectal cancer patients in India. MSystems 4:e00438-19. doi: 10.1128/mSystems.00438-19, PMID: PubMed DOI PMC
Gupta M. M., Gupta A. (2021). Survey of artificial intelligence approaches in the study of anthropogenic impacts on symbiotic organisms – a holistic view. Symbiosis 84, 271–283. doi: 10.1007/s13199-021-00778-0 DOI
Hadrich D. (2020). New EU projects delivering human microbiome applications. Fut. Sci. OA 6:FSO474. doi: 10.2144/fsoa-2020-0028, PMID: PubMed DOI PMC
Hernández Medina R., Kutuzova S., Nielsen K. N., Johansen J., Hansen L. H., Nielsen M., et al. . (2022). Machine learning and deep learning applications in microbiome research. ISME Commun. 2:98. doi: 10.1038/s43705-022-00182-9 PubMed DOI PMC
Holmes I., Harris K., Quince C. (2012). Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS One 7:e30126. doi: 10.1371/journal.pone.0030126, PMID: PubMed DOI PMC
Hughes D. A., Bacigalupe R., Wang J., Rühlemann M. C., Tito R. Y., Falony G., et al. . (2020). Genome-wide associations of human gut microbiome variation and implications for causal inference analyses. Nat. Microbiol. 5, 1079–1087. doi: 10.1038/s41564-020-0743-8, PMID: PubMed DOI PMC
Jeganathan P., Holmes S. P. (2021). A statistical perspective on the challenges in molecular microbial biology. J. Agric. Biol. Environ. Stat. 26, 131–160. doi: 10.1007/s13253-021-00447-1, PMID: PubMed DOI PMC
Jian C., Luukkonen P., Yki-Järvinen H., Salonen A., Korpela K. (2020). Quantitative PCR provides a simple and accessible method for quantitative microbiota profiling. PLoS One 15:e0227285. doi: 10.1371/journal.pone.0227285, PMID: PubMed DOI PMC
Jiang Z., Li J., Kong N., Kim J.-H., Kim B.-S., Lee M.-J., et al. . (2022). Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning. Sci. Rep. 12:290. doi: 10.1038/s41598-021-04373-7, PMID: PubMed DOI PMC
Jiang S., Xiao G., Koh A. Y., Kim J., Li Q., Zhan X. (2021). A Bayesian zero-inflated negative binomial regression model for the integrative analysis of microbiome data. Biostatistics 22, 522–540. doi: 10.1093/biostatistics/kxz050, PMID: PubMed DOI PMC
Kapoor S., Narayanan A. (2022). Leakage and the reproducibility crisis in ML-based science. Available at: http://arxiv.org/abs/2207.07048. PubMed PMC
Kubinski R., Djamen-Kepaou J.-Y., Zhanabaev T., Hernandez-Garcia A., Bauer S., Hildebrand F., et al. . (2022). Benchmark of data processing methods and machine learning models for gut microbiome-based diagnosis of inflammatory bowel disease. Front. Genet. 13:784397. doi: 10.3389/fgene.2022.784397, PMID: PubMed DOI PMC
Lahti L., Salonen A., Kekkonen R. A., Salojärvi J., Jalanka-Tuovinen J., Palva A., et al. . (2013). Associations between the human intestinal microbiota, Lactobacillus rhamnosus GG and serum lipids indicated by integrated analysis of high-throughput profiling data. PeerJ 1:e32. doi: 10.7717/peerj.32, PMID: PubMed DOI PMC
Lê Cao K.-A., Costello M.-E., Lakis V. A., Bartolo F., Chua X.-Y., Brazeilles R., et al. . (2016). MixMC: A multivariate statistical framework to gain insight into microbial communities. PLoS One 11:e0160169. doi: 10.1371/journal.pone.0160169, PMID: PubMed DOI PMC
Liu W., Fang X., Zhou Y., Dou L., Dou T. (2022). Machine learning-based investigation of the relationship between gut microbiome and obesity status. Microbes Infect. 24:104892. doi: 10.1016/j.micinf.2021.104892, PMID: PubMed DOI
Liu Z., Hsiao W., Cantarel B. L., Drábek E. F., Fraser-Liggett C. (2011). Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics 27, 3242–3249. doi: 10.1093/bioinformatics/btr547, PMID: PubMed DOI PMC
Liu Y., Méric G., Havulinna A. S., Teo S. M., Åberg F., Ruuskanen M., et al. . (2022). Early prediction of incident liver disease using conventional risk factors and gut-microbiome-augmented gradient boosting. Cell Metab. 34, 719–730.e4. doi: 10.1016/j.cmet.2022.03.002, PMID: PubMed DOI PMC
Lloréns-Rico V., Vieira-Silva S., Gonçalves P. J., Falony G., Raes J. (2021). Benchmarking microbiome transformations favors experimental quantitative approaches to address compositionality and sampling depth biases. Nat. Commun. 12:3562. doi: 10.1038/s41467-021-23821-6, PMID: PubMed DOI PMC
Lo C., Marculescu R. (2019). MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinformatics 20:314. doi: 10.1186/s12859-019-2833-2, PMID: PubMed DOI PMC
Love M. I., Huber W., Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550. doi: 10.1186/s13059-014-0550-8, PMID: PubMed DOI PMC
Marcos-Zambrano L. J., Karaduzovic-Hadziabdic K., Loncar Turukalo T., Przymus P., Trajkovik V., Aasmets O., et al. . (2021). Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment. Front. Microbiol. 12:634511. doi: 10.3389/fmicb.2021.634511, PMID: PubMed DOI PMC
Martin M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal 17:10. doi: 10.14806/ej.17.1.200 DOI
McKnight D. T., Huerlimann R., Bower D. S., Schwarzkopf L., Alford R. A., Zenger K. R. (2019). Methods for normalizing microbiome data: An ecological perspective. Methods Ecol. Evol. 10, 389–400. doi: 10.1111/2041-210X.13115 DOI
Mirzayi C., Renson A., Furlanello C., Sansone S.-A., Zohra F., Elsafoury S., et al. . (2021). Reporting guidelines for human microbiome research: the STORMS checklist. Nat. Med. 27, 1885–1892. doi: 10.1038/s41591-021-01552-x, PMID: PubMed DOI PMC
Moreno-Indias I., Lahti L., Nedyalkova M., Elbere I., Roshchupkin G., Adilovic M., et al. . (2021). Statistical and machine learning techniques in human microbiome studies: contemporary challenges and solutions. Front. Microbiol. 12:635781. doi: 10.3389/fmicb.2021.635781, PMID: PubMed DOI PMC
Mulenga M., Abdul Kareem S., Qalid Md Sabri A., Seera M., Govind S., Samudi C., et al. . (2021). Feature extension of gut microbiome data for deep neural network-based colorectal cancer classification. IEEE Access 9, 23565–23578. doi: 10.1109/ACCESS.2021.3050838 DOI
Murovec B., Deutsch L., Stres B. (2021). General unified microbiome profiling pipeline (GUMPP) for large scale, streamlined and reproducible analysis of bacterial 16S rRNA data to predicted microbial metagenomes, enzymatic reactions and metabolic pathways. Metabolites 11:336. doi: 10.3390/metabo11060336, PMID: PubMed DOI PMC
Ni Y., Lohinai Z., Heshiki Y., Dome B., Moldvay J., Dulka E., et al. . (2021). Distinct composition and metabolic functions of human gut microbiota are associated with cachexia in lung cancer patients. ISME J. 15, 3207–3220. doi: 10.1038/s41396-021-00998-8 PubMed DOI PMC
Ning J., Beiko R. G. (2015). Phylogenetic approaches to microbial community classification. Microbiome 3:47. doi: 10.1186/s40168-015-0114-5, PMID: PubMed DOI PMC
Papoutsoglou G., Tarazona S., Lopes M. B., Klammsteiner T., Ibrahimi E., Eckenberger J., et al. . (2023). Machine learning approaches in microbiome research: challenges and best practices. Front. Microbiol. 14:1261889. doi: 10.3389/fmicb.2023.1261889 PubMed DOI PMC
Pawlowsky-Glahn V., Egozcue J. J., Tolosana-Delgado R. (2015). Modelling and analysis of compositional data. Chichester: John Wiley & Sons, Ltd.
Props R., Kerckhof F.-M., Rubbens P., De Vrieze J., Hernandez Sanabria E., Waegeman W., et al. . (2017). Absolute quantification of microbial taxon abundances. ISME J. 11, 584–587. doi: 10.1038/ismej.2016.117, PMID: PubMed DOI PMC
Quinn T. P., Erb I. (2020). Interpretable log contrasts for the classification of health biomarkers: a new approach to balance selection. MSystems 5:e00230-19. doi: 10.1128/mSystems.00230-19, PMID: PubMed DOI PMC
Quinn T. P., Erb I., Richardson M. F., Crowley T. M. (2018). Understanding sequencing data as compositions: an outlook and review. Bioinformatics 34, 2870–2878. doi: 10.1093/bioinformatics/bty175, PMID: PubMed DOI PMC
Reiman D., Layden B. T., Dai Y. (2021). MiMeNet: Exploring microbiome-metabolome relationships using neural networks. PLoS Comput. Biol. 17:e1009021. doi: 10.1371/journal.pcbi.1009021, PMID: PubMed DOI PMC
Robinson M. D., McCarthy D. J., Smyth G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140. doi: 10.1093/bioinformatics/btp616, PMID: PubMed DOI PMC
Ruuskanen M. O., Åberg F., Männistö V., Havulinna A. S., Méric G., Liu Y., et al. . (2021). Links between gut microbiome composition and fatty liver disease in a large population sample. Gut Microbes 13, 1–22. doi: 10.1080/19490976.2021.1888673, PMID: PubMed DOI PMC
Ryan F. J., Ahern A. M., Fitzgerald R. S., Laserna-Mendieta E. J., Power E. M., Clooney A. G., et al. . (2020). Colonic microbiota is associated with inflammation and host epigenomic alterations in inflammatory bowel disease. Nat. Commun. 11:1512. doi: 10.1038/s41467-020-15342-5, PMID: PubMed DOI PMC
Silverman J. D., Roche K., Mukherjee S., David L. A. (2020). Naught all zeros in sequence count data are the same. Comput. Struct. Biotechnol. J. 18, 2789–2798. doi: 10.1016/j.csbj.2020.09.014, PMID: PubMed DOI PMC
Stämmler F., Gläsner J., Hiergeist A., Holler E., Weber D., Oefner P. J., et al. . (2016). Adjusting microbiome profiles for differences in microbial load by spike-in bacteria. Microbiome 4:28. doi: 10.1186/s40168-016-0175-0, PMID: PubMed DOI PMC
Statnikov A., Henaff M., Narendra V., Konganti K., Li Z., Yang L., et al. . (2013). A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1:11. doi: 10.1186/2049-2618-1-11, PMID: PubMed DOI PMC
Štefelová N., Palarea-Albaladejo J., Hron K. (2021). Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data. Stat. Anal. Data Mining ASA Data Sci. J. 14, 315–330. doi: 10.1002/sam.11514 DOI
Swift D., Cresswell K., Johnson R., Stilianoudakis S., Wei X. (2023). A review of normalization and differential abundance methods for microbiome counts data. WIREs. Comput. Stat. 15:e1586. doi: 10.1002/wics.1586 DOI
Tap J., Derrien M., Törnblom H., Brazeilles R., Cools-Portier S., Doré J., et al. . (2017). Identification of an intestinal microbiota signature associated with severity of irritable bowel syndrome. Gastroenterology 152, 111–123.e8. doi: 10.1053/j.gastro.2016.09.049, PMID: PubMed DOI
Thomas A. M., Manghi P., Asnicar F., Pasolli E., Armanini F., Zolfo M., et al. . (2019). Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 25, 667–678. doi: 10.1038/s41591-019-0405-7, PMID: PubMed DOI PMC
Thorsen J., Brejnrod A., Mortensen M., Rasmussen M. A., Stokholm J., Al-Soud W. A., et al. . (2016). Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome 4:62. doi: 10.1186/s40168-016-0208-8, PMID: PubMed DOI PMC
Travisany D., Galarce D., Maass A., Assar R. (2015). “Predicting the metagenomics content with multiple CART trees” in Mathematical Models in Biology (Cham: Springer International Publishing; ), 145–160.
van den Boogaart K. G., Tolosana-Delgado R. (2008). “compositions”: A unified R package to analyze compositional data. Comput. Geosci. 34, 320–338. doi: 10.1016/j.cageo.2006.11.017 DOI
Vandeputte D., Kathagen G., D’hoe K., Vieira-Silva S., Valles-Colomer M., Sabino J., et al. . (2017). Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511. doi: 10.1038/nature24460, PMID: PubMed DOI
Vangay P., Hillmann B. M., Knights D. (2019). Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks. GigaScience 8:giz042. doi: 10.1093/gigascience/giz042 PubMed DOI PMC
Weiss S., Xu Z. Z., Peddada S., Amir A., Bittinger K., Gonzalez A., et al. . (2017). Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5:27. doi: 10.1186/s40168-017-0237-y, PMID: PubMed DOI PMC
Wirbel J., Pyl P. T., Kartal E., Zych K., Kashani A., Milanese A., et al. . (2019). Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat. Med. 25, 679–689. doi: 10.1038/s41591-019-0406-6, PMID: PubMed DOI PMC
Wu H., Cai L., Li D., Wang X., Zhao S., Zou F., et al. . (2018). Metagenomics biomarkers selected for prediction of three different diseases in Chinese population. Biomed. Res. Int. 2018, 1–7. doi: 10.1155/2018/2936257 PubMed DOI PMC
Wu S., Chen Y., Li Z., Li J., Zhao F., Su X. (2021). Towards multi-label classification: Next step of machine learning for microbiome research. Comput. Struct. Biotechnol. J. 19, 2742–2749. doi: 10.1016/j.csbj.2021.04.054, PMID: PubMed DOI PMC
Wu T., Wang H., Lu W., Zhai Q., Zhang Q., Yuan W., et al. . (2020). Potential of gut microbiome for detection of autism spectrum disorder. Microb. Pathog. 149:104568. doi: 10.1016/j.micpath.2020.104568, PMID: PubMed DOI
Xia Y., Sun J., Chen D.-G. (2018). Statistical Analysis of Microbiome Data with R. Springer: Singapore.
Xu C., Zhou M., Xie Z., Li M., Zhu X., Zhu H. (2021). LightCUD: a program for diagnosing IBD based on human gut microbiome data. BioData Mining 14:2. doi: 10.1186/s13040-021-00241-2, PMID: PubMed DOI PMC
Yachida S., Mizutani S., Shiroma H., Shiba S., Nakajima T., Sakamoto T., et al. . (2019). Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat. Med. 25, 968–976. doi: 10.1038/s41591-019-0458-7, PMID: PubMed DOI
Yang F., Zou Q. (2020). mAML: an automated machine learning pipeline with a microbiome repository for human disease classification. Database 2020:baaa050. doi: 10.1093/database/baaa050, PMID: PubMed DOI PMC
Yang F., Zou Q., Gao B. (2021). GutBalance: a server for the human gut microbiome-based disease prediction and biomarker discovery with compositionality addressed. Brief. Bioinform. 22:bbaa436. doi: 10.1093/bib/bbaa436 PubMed DOI
Zhang X., Mallick H., Tang Z., Zhang L., Cui X., Benson A. K., et al. . (2017). Negative binomial mixed models for analyzing microbiome count data. BMC Bioinformatics 18:4. doi: 10.1186/s12859-016-1441-7, PMID: PubMed DOI PMC
Zhu C., Wang X., Li J., Jiang R., Chen H., Chen T., et al. . (2022). Determine independent gut microbiota-diseases association by eliminating the effects of human lifestyle factors. BMC Microbiol. 22:4. doi: 10.1186/s12866-021-02414-9, PMID: PubMed DOI PMC