Overview of data preprocessing for machine learning applications in human microbiome research

. 2023 ; 14 () : 1250909. [epub] 20231005

Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection

Typ dokumentu časopisecké články, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/pmid37869650

Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.

Zobrazit více v PubMed

Adade E. E., Al Lakhen K., Lemus A. A., Valm A. M. (2021). Recent progress in analyzing the spatial structure of the human microbiome: Distinguishing biogeography and architecture in the oral and gut communities. Curr. Opin. Endocr. Metab. Res. 18, 275–283. doi: 10.1016/j.coemr.2021.04.005, PMID: PubMed DOI PMC

Aitchison J. (1982). The statistical analysis of compositional data (with discussion). J R Stat Soc Series B. 44, 139–177.

Aitchison J. (1986). The statistical analysis of compositional data. London: Chapman & Hall.

Amir A., McDonald D., Navas-Molina J. A., Kopylova E., Morton J. T., Zech Xu Z., et al. . (2017). Deblur rapidly resolves single-nucleotide community sequence patterns. MSystems 2:e00191-16. doi: 10.1128/mSystems.00191-16, PMID: PubMed DOI PMC

Arksey H., O’Malley L. (2005). Scoping studies: towards a methodological framework. Int. J. Soc. Res. Methodol. 8, 19–32. doi: 10.1080/1364557032000119616 DOI

Baksi K. D., Kuntal B. K., Mande S. S. (2018). ‘TIME’: a web application for obtaining insights into microbial ecology using longitudinal microbiome data. Front. Microbiol. 9:36. doi: 10.3389/fmicb.2018.00036, PMID: PubMed DOI PMC

Beghini F., McIver L. J., Blanco-Míguez A., Dubois L., Asnicar F., Maharjan S., et al. . (2021). Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. elife 10:e65088. doi: 10.7554/eLife.65088, PMID: PubMed DOI PMC

Blanco-Míguez A., Beghini F., Cumbo F., McIver L. J., Thompson K. N., Zolfo M., et al. . (2023). Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 1–12. doi: 10.1038/s41587-023-01688-w, PMID: PubMed DOI PMC

Bogart E., Creswell R., Gerber G. K. (2019). MITRE: inferring features from microbiota time-series data linked to host status. Genome Biol. 20:186. doi: 10.1186/s13059-019-1788-y, PMID: PubMed DOI PMC

Bokulich N. A., Subramanian S., Faith J. J., Gevers D., Gordon J. I., Knight R., et al. . (2013). Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing. Nat. Methods 10, 57–59. doi: 10.1038/nmeth.2276, PMID: PubMed DOI PMC

Bolger A. M., Lohse M., Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. doi: 10.1093/bioinformatics/btu170, PMID: PubMed DOI PMC

Bushnell B., Rood J., Singer E. (2017). BBMerge – Accurate paired shotgun read merging via overlap. PLoS One 12:e0185056. doi: 10.1371/journal.pone.0185056, PMID: PubMed DOI PMC

Callahan B. J., McMurdie P. J., Rosen M. J., Han A. W., Johnson A. J., Holmes S. P. (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583. doi: 10.1038/nmeth.3869 PubMed DOI PMC

Chen L., Reeve J., Zhang L., Huang S., Wang X., Chen J. (2018). GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data. PeerJ 6:e4600. doi: 10.7717/peerj.4600, PMID: PubMed DOI PMC

Chen Y., Wu T., Lu W., Yuan W., Pan M., Lee Y.-K., et al. . (2021). Predicting the role of the human gut microbiome in constipation using machine-learning methods: a meta-analysis. Microorganisms 9:2149. doi: 10.3390/microorganisms9102149, PMID: PubMed DOI PMC

Clarotto L., Allard D., Menafoglio A. (2022). A new class of α-transformations for the spatial analysis of compositional data. Spat. Stat. 47:100570. doi: 10.1016/j.spasta.2021.100570 DOI

Costea P. I., Zeller G., Sunagawa S., Bork P. (2014). A fair comparison. Nat. Methods 11:359. doi: 10.1038/nmeth.2897 PubMed DOI

D’Elia D., Truu J., Lahti L., Berland M., Papoutsoglou G., Ceci M., et al. . (2023). Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action. Front. Microbiol. 14:1257002. doi: 10.3389/fmicb.2023.1257002 PubMed DOI PMC

Dhungel E., Mreyoud Y., Gwak H.-J., Rajeh A., Rho M., Ahn T.-H. (2021). MegaR: an interactive R package for rapid sample classification and phenotype prediction using metagenome profiles and machine learning. BMC Bioinformatics 22:25. doi: 10.1186/s12859-020-03933-4, PMID: PubMed DOI PMC

Eck A., Zintgraf L. M., de Groot E. F. J., de Meij T. G. J., Cohen T. S., Savelkoul P. H. M., et al. . (2017). Interpretation of microbiota-based diagnostics by explaining individual classifier decisions. BMC Bioinformatics 18:441. doi: 10.1186/s12859-017-1843-1, PMID: PubMed DOI PMC

Edgar R. C., Haas B. J., Clemente J. C., Quince C., Knight R. (2011). UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27, 2194–2200. doi: 10.1093/bioinformatics/btr381, PMID: PubMed DOI PMC

Egozcue J. J., Pawlowsky-Glahn V. (2005). Groups of parts and their balances in compositional data analysis. Math. Geol. 37, 795–828. doi: 10.1007/s11004-005-7381-9 DOI

Egozcue J. J., Pawlowsky-Glahn V., Mateu-Figueras G., Barceló-Vidal C. (2003). Isometric logratio transformations for compositional data analysis. Math. Geol. 35, 279–300. doi: 10.1023/A:1023818214614 DOI

Fabijanić M., Vlahoviček K. (2016). Big data, evolution, and metagenomes: predicting disease from gut microbiota codon usage profiles. Methods Mol. Biol. 1415, 509–531. doi: 10.1007/978-1-4939-3572-7_26 PubMed DOI

Fernández-Edreira D., Liñares-Blanco J., Fernandez-Lozano C. (2021). Machine Learning analysis of the human infant gut microbiome identifies influential species in type 1 diabetes. Expert Syst. Appl. 185:115648. doi: 10.1016/j.eswa.2021.115648 DOI

Filzmoser P., Hron K., Templ M. (2018). Applied compositional data analysis. Cham: Springer International Publishing.

Filzmoser P., Walczak B. (2014). What can go wrong at the data normalization step for identification of biomarkers? J. Chromatogr. A 1362, 194–205. doi: 10.1016/j.chroma.2014.08.050, PMID: PubMed DOI

Flemer B., Warren R. D., Barrett M. P., Cisek K., Das A., Jeffery I. B., et al. . (2018). The oral microbiota in colorectal cancer is distinctive and predictive. Gut 67, 1454–1463. doi: 10.1136/gutjnl-2017-314814, PMID: PubMed DOI PMC

Fouladi F., Carroll I. M., Sharpton T. J., Bulik-Sullivan E., Heinberg L., Steffen K. J., et al. . (2021). A microbial signature following bariatric surgery is robustly consistent across multiple cohorts. Gut Microbes 13:1930872. doi: 10.1080/19490976.2021.1930872, PMID: PubMed DOI PMC

Fukui H., Nishida A., Matsuda S., Kira F., Watanabe S., Kuriyama M., et al. . (2020). Usefulness of machine learning-based gut microbiome analysis for identifying patients with irritable bowels syndrome. J. Clin. Med. 9:2403. doi: 10.3390/jcm9082403, PMID: PubMed DOI PMC

Galkin F., Mamoshina P., Aliper A., Putin E., Moskalev V., Gladyshev V. N., et al. . (2020). Human gut microbiome aging clock based on taxonomic profiling and deep learning. IScience 23:101199. doi: 10.1016/j.isci.2020.101199, PMID: PubMed DOI PMC

Gloor G. B., Wu J. R., Pawlowsky-Glahn V., Egozcue J. J. (2016). It’s all relative: analyzing microbiome data as compositions. Ann. Epidemiol. 26, 322–329. doi: 10.1016/j.annepidem.2016.03.003, PMID: PubMed DOI

Greenacre M. (2010). Log-ratio analysis is a limiting case of correspondence analysis. Math. Geosci. 42, 129–134. doi: 10.1007/s11004-008-9212-2 DOI

Greenacre M. (2011). Measuring subcompositional incoherence. Math. Geosci. 43, 681–693. doi: 10.1007/s11004-011-9338-5 DOI

Greenacre M., Martínez-Álvaro M., Blasco A. (2021). Compositional data analysis of microbiome and any-omics datasets: a validation of the additive logratio transformation. Front. Microbiol. 12:727398. doi: 10.3389/fmicb.2021.727398, PMID: PubMed DOI PMC

Gupta A., Dhakan D. B., Maji A., Saxena R., P K V. P., Mahajan S., et al. . (2019). Association of Flavonifractor plautii, a flavonoid-degrading bacterium, with the gut microbiome of colorectal cancer patients in India. MSystems 4:e00438-19. doi: 10.1128/mSystems.00438-19, PMID: PubMed DOI PMC

Gupta M. M., Gupta A. (2021). Survey of artificial intelligence approaches in the study of anthropogenic impacts on symbiotic organisms – a holistic view. Symbiosis 84, 271–283. doi: 10.1007/s13199-021-00778-0 DOI

Hadrich D. (2020). New EU projects delivering human microbiome applications. Fut. Sci. OA 6:FSO474. doi: 10.2144/fsoa-2020-0028, PMID: PubMed DOI PMC

Hernández Medina R., Kutuzova S., Nielsen K. N., Johansen J., Hansen L. H., Nielsen M., et al. . (2022). Machine learning and deep learning applications in microbiome research. ISME Commun. 2:98. doi: 10.1038/s43705-022-00182-9 PubMed DOI PMC

Holmes I., Harris K., Quince C. (2012). Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS One 7:e30126. doi: 10.1371/journal.pone.0030126, PMID: PubMed DOI PMC

Hughes D. A., Bacigalupe R., Wang J., Rühlemann M. C., Tito R. Y., Falony G., et al. . (2020). Genome-wide associations of human gut microbiome variation and implications for causal inference analyses. Nat. Microbiol. 5, 1079–1087. doi: 10.1038/s41564-020-0743-8, PMID: PubMed DOI PMC

Jeganathan P., Holmes S. P. (2021). A statistical perspective on the challenges in molecular microbial biology. J. Agric. Biol. Environ. Stat. 26, 131–160. doi: 10.1007/s13253-021-00447-1, PMID: PubMed DOI PMC

Jian C., Luukkonen P., Yki-Järvinen H., Salonen A., Korpela K. (2020). Quantitative PCR provides a simple and accessible method for quantitative microbiota profiling. PLoS One 15:e0227285. doi: 10.1371/journal.pone.0227285, PMID: PubMed DOI PMC

Jiang Z., Li J., Kong N., Kim J.-H., Kim B.-S., Lee M.-J., et al. . (2022). Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning. Sci. Rep. 12:290. doi: 10.1038/s41598-021-04373-7, PMID: PubMed DOI PMC

Jiang S., Xiao G., Koh A. Y., Kim J., Li Q., Zhan X. (2021). A Bayesian zero-inflated negative binomial regression model for the integrative analysis of microbiome data. Biostatistics 22, 522–540. doi: 10.1093/biostatistics/kxz050, PMID: PubMed DOI PMC

Kapoor S., Narayanan A. (2022). Leakage and the reproducibility crisis in ML-based science. Available at: http://arxiv.org/abs/2207.07048. PubMed PMC

Kubinski R., Djamen-Kepaou J.-Y., Zhanabaev T., Hernandez-Garcia A., Bauer S., Hildebrand F., et al. . (2022). Benchmark of data processing methods and machine learning models for gut microbiome-based diagnosis of inflammatory bowel disease. Front. Genet. 13:784397. doi: 10.3389/fgene.2022.784397, PMID: PubMed DOI PMC

Lahti L., Salonen A., Kekkonen R. A., Salojärvi J., Jalanka-Tuovinen J., Palva A., et al. . (2013). Associations between the human intestinal microbiota, Lactobacillus rhamnosus GG and serum lipids indicated by integrated analysis of high-throughput profiling data. PeerJ 1:e32. doi: 10.7717/peerj.32, PMID: PubMed DOI PMC

Lê Cao K.-A., Costello M.-E., Lakis V. A., Bartolo F., Chua X.-Y., Brazeilles R., et al. . (2016). MixMC: A multivariate statistical framework to gain insight into microbial communities. PLoS One 11:e0160169. doi: 10.1371/journal.pone.0160169, PMID: PubMed DOI PMC

Liu W., Fang X., Zhou Y., Dou L., Dou T. (2022). Machine learning-based investigation of the relationship between gut microbiome and obesity status. Microbes Infect. 24:104892. doi: 10.1016/j.micinf.2021.104892, PMID: PubMed DOI

Liu Z., Hsiao W., Cantarel B. L., Drábek E. F., Fraser-Liggett C. (2011). Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics 27, 3242–3249. doi: 10.1093/bioinformatics/btr547, PMID: PubMed DOI PMC

Liu Y., Méric G., Havulinna A. S., Teo S. M., Åberg F., Ruuskanen M., et al. . (2022). Early prediction of incident liver disease using conventional risk factors and gut-microbiome-augmented gradient boosting. Cell Metab. 34, 719–730.e4. doi: 10.1016/j.cmet.2022.03.002, PMID: PubMed DOI PMC

Lloréns-Rico V., Vieira-Silva S., Gonçalves P. J., Falony G., Raes J. (2021). Benchmarking microbiome transformations favors experimental quantitative approaches to address compositionality and sampling depth biases. Nat. Commun. 12:3562. doi: 10.1038/s41467-021-23821-6, PMID: PubMed DOI PMC

Lo C., Marculescu R. (2019). MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinformatics 20:314. doi: 10.1186/s12859-019-2833-2, PMID: PubMed DOI PMC

Love M. I., Huber W., Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550. doi: 10.1186/s13059-014-0550-8, PMID: PubMed DOI PMC

Marcos-Zambrano L. J., Karaduzovic-Hadziabdic K., Loncar Turukalo T., Przymus P., Trajkovik V., Aasmets O., et al. . (2021). Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment. Front. Microbiol. 12:634511. doi: 10.3389/fmicb.2021.634511, PMID: PubMed DOI PMC

Martin M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal 17:10. doi: 10.14806/ej.17.1.200 DOI

McKnight D. T., Huerlimann R., Bower D. S., Schwarzkopf L., Alford R. A., Zenger K. R. (2019). Methods for normalizing microbiome data: An ecological perspective. Methods Ecol. Evol. 10, 389–400. doi: 10.1111/2041-210X.13115 DOI

Mirzayi C., Renson A., Furlanello C., Sansone S.-A., Zohra F., Elsafoury S., et al. . (2021). Reporting guidelines for human microbiome research: the STORMS checklist. Nat. Med. 27, 1885–1892. doi: 10.1038/s41591-021-01552-x, PMID: PubMed DOI PMC

Moreno-Indias I., Lahti L., Nedyalkova M., Elbere I., Roshchupkin G., Adilovic M., et al. . (2021). Statistical and machine learning techniques in human microbiome studies: contemporary challenges and solutions. Front. Microbiol. 12:635781. doi: 10.3389/fmicb.2021.635781, PMID: PubMed DOI PMC

Mulenga M., Abdul Kareem S., Qalid Md Sabri A., Seera M., Govind S., Samudi C., et al. . (2021). Feature extension of gut microbiome data for deep neural network-based colorectal cancer classification. IEEE Access 9, 23565–23578. doi: 10.1109/ACCESS.2021.3050838 DOI

Murovec B., Deutsch L., Stres B. (2021). General unified microbiome profiling pipeline (GUMPP) for large scale, streamlined and reproducible analysis of bacterial 16S rRNA data to predicted microbial metagenomes, enzymatic reactions and metabolic pathways. Metabolites 11:336. doi: 10.3390/metabo11060336, PMID: PubMed DOI PMC

Ni Y., Lohinai Z., Heshiki Y., Dome B., Moldvay J., Dulka E., et al. . (2021). Distinct composition and metabolic functions of human gut microbiota are associated with cachexia in lung cancer patients. ISME J. 15, 3207–3220. doi: 10.1038/s41396-021-00998-8 PubMed DOI PMC

Ning J., Beiko R. G. (2015). Phylogenetic approaches to microbial community classification. Microbiome 3:47. doi: 10.1186/s40168-015-0114-5, PMID: PubMed DOI PMC

Papoutsoglou G., Tarazona S., Lopes M. B., Klammsteiner T., Ibrahimi E., Eckenberger J., et al. . (2023). Machine learning approaches in microbiome research: challenges and best practices. Front. Microbiol. 14:1261889. doi: 10.3389/fmicb.2023.1261889 PubMed DOI PMC

Pawlowsky-Glahn V., Egozcue J. J., Tolosana-Delgado R. (2015). Modelling and analysis of compositional data. Chichester: John Wiley & Sons, Ltd.

Props R., Kerckhof F.-M., Rubbens P., De Vrieze J., Hernandez Sanabria E., Waegeman W., et al. . (2017). Absolute quantification of microbial taxon abundances. ISME J. 11, 584–587. doi: 10.1038/ismej.2016.117, PMID: PubMed DOI PMC

Quinn T. P., Erb I. (2020). Interpretable log contrasts for the classification of health biomarkers: a new approach to balance selection. MSystems 5:e00230-19. doi: 10.1128/mSystems.00230-19, PMID: PubMed DOI PMC

Quinn T. P., Erb I., Richardson M. F., Crowley T. M. (2018). Understanding sequencing data as compositions: an outlook and review. Bioinformatics 34, 2870–2878. doi: 10.1093/bioinformatics/bty175, PMID: PubMed DOI PMC

Reiman D., Layden B. T., Dai Y. (2021). MiMeNet: Exploring microbiome-metabolome relationships using neural networks. PLoS Comput. Biol. 17:e1009021. doi: 10.1371/journal.pcbi.1009021, PMID: PubMed DOI PMC

Robinson M. D., McCarthy D. J., Smyth G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140. doi: 10.1093/bioinformatics/btp616, PMID: PubMed DOI PMC

Ruuskanen M. O., Åberg F., Männistö V., Havulinna A. S., Méric G., Liu Y., et al. . (2021). Links between gut microbiome composition and fatty liver disease in a large population sample. Gut Microbes 13, 1–22. doi: 10.1080/19490976.2021.1888673, PMID: PubMed DOI PMC

Ryan F. J., Ahern A. M., Fitzgerald R. S., Laserna-Mendieta E. J., Power E. M., Clooney A. G., et al. . (2020). Colonic microbiota is associated with inflammation and host epigenomic alterations in inflammatory bowel disease. Nat. Commun. 11:1512. doi: 10.1038/s41467-020-15342-5, PMID: PubMed DOI PMC

Silverman J. D., Roche K., Mukherjee S., David L. A. (2020). Naught all zeros in sequence count data are the same. Comput. Struct. Biotechnol. J. 18, 2789–2798. doi: 10.1016/j.csbj.2020.09.014, PMID: PubMed DOI PMC

Stämmler F., Gläsner J., Hiergeist A., Holler E., Weber D., Oefner P. J., et al. . (2016). Adjusting microbiome profiles for differences in microbial load by spike-in bacteria. Microbiome 4:28. doi: 10.1186/s40168-016-0175-0, PMID: PubMed DOI PMC

Statnikov A., Henaff M., Narendra V., Konganti K., Li Z., Yang L., et al. . (2013). A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1:11. doi: 10.1186/2049-2618-1-11, PMID: PubMed DOI PMC

Štefelová N., Palarea-Albaladejo J., Hron K. (2021). Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data. Stat. Anal. Data Mining ASA Data Sci. J. 14, 315–330. doi: 10.1002/sam.11514 DOI

Swift D., Cresswell K., Johnson R., Stilianoudakis S., Wei X. (2023). A review of normalization and differential abundance methods for microbiome counts data. WIREs. Comput. Stat. 15:e1586. doi: 10.1002/wics.1586 DOI

Tap J., Derrien M., Törnblom H., Brazeilles R., Cools-Portier S., Doré J., et al. . (2017). Identification of an intestinal microbiota signature associated with severity of irritable bowel syndrome. Gastroenterology 152, 111–123.e8. doi: 10.1053/j.gastro.2016.09.049, PMID: PubMed DOI

Thomas A. M., Manghi P., Asnicar F., Pasolli E., Armanini F., Zolfo M., et al. . (2019). Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 25, 667–678. doi: 10.1038/s41591-019-0405-7, PMID: PubMed DOI PMC

Thorsen J., Brejnrod A., Mortensen M., Rasmussen M. A., Stokholm J., Al-Soud W. A., et al. . (2016). Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome 4:62. doi: 10.1186/s40168-016-0208-8, PMID: PubMed DOI PMC

Travisany D., Galarce D., Maass A., Assar R. (2015). “Predicting the metagenomics content with multiple CART trees” in Mathematical Models in Biology (Cham: Springer International Publishing; ), 145–160.

van den Boogaart K. G., Tolosana-Delgado R. (2008). “compositions”: A unified R package to analyze compositional data. Comput. Geosci. 34, 320–338. doi: 10.1016/j.cageo.2006.11.017 DOI

Vandeputte D., Kathagen G., D’hoe K., Vieira-Silva S., Valles-Colomer M., Sabino J., et al. . (2017). Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511. doi: 10.1038/nature24460, PMID: PubMed DOI

Vangay P., Hillmann B. M., Knights D. (2019). Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks. GigaScience 8:giz042. doi: 10.1093/gigascience/giz042 PubMed DOI PMC

Weiss S., Xu Z. Z., Peddada S., Amir A., Bittinger K., Gonzalez A., et al. . (2017). Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5:27. doi: 10.1186/s40168-017-0237-y, PMID: PubMed DOI PMC

Wirbel J., Pyl P. T., Kartal E., Zych K., Kashani A., Milanese A., et al. . (2019). Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat. Med. 25, 679–689. doi: 10.1038/s41591-019-0406-6, PMID: PubMed DOI PMC

Wu H., Cai L., Li D., Wang X., Zhao S., Zou F., et al. . (2018). Metagenomics biomarkers selected for prediction of three different diseases in Chinese population. Biomed. Res. Int. 2018, 1–7. doi: 10.1155/2018/2936257 PubMed DOI PMC

Wu S., Chen Y., Li Z., Li J., Zhao F., Su X. (2021). Towards multi-label classification: Next step of machine learning for microbiome research. Comput. Struct. Biotechnol. J. 19, 2742–2749. doi: 10.1016/j.csbj.2021.04.054, PMID: PubMed DOI PMC

Wu T., Wang H., Lu W., Zhai Q., Zhang Q., Yuan W., et al. . (2020). Potential of gut microbiome for detection of autism spectrum disorder. Microb. Pathog. 149:104568. doi: 10.1016/j.micpath.2020.104568, PMID: PubMed DOI

Xia Y., Sun J., Chen D.-G. (2018). Statistical Analysis of Microbiome Data with R. Springer: Singapore.

Xu C., Zhou M., Xie Z., Li M., Zhu X., Zhu H. (2021). LightCUD: a program for diagnosing IBD based on human gut microbiome data. BioData Mining 14:2. doi: 10.1186/s13040-021-00241-2, PMID: PubMed DOI PMC

Yachida S., Mizutani S., Shiroma H., Shiba S., Nakajima T., Sakamoto T., et al. . (2019). Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat. Med. 25, 968–976. doi: 10.1038/s41591-019-0458-7, PMID: PubMed DOI

Yang F., Zou Q. (2020). mAML: an automated machine learning pipeline with a microbiome repository for human disease classification. Database 2020:baaa050. doi: 10.1093/database/baaa050, PMID: PubMed DOI PMC

Yang F., Zou Q., Gao B. (2021). GutBalance: a server for the human gut microbiome-based disease prediction and biomarker discovery with compositionality addressed. Brief. Bioinform. 22:bbaa436. doi: 10.1093/bib/bbaa436 PubMed DOI

Zhang X., Mallick H., Tang Z., Zhang L., Cui X., Benson A. K., et al. . (2017). Negative binomial mixed models for analyzing microbiome count data. BMC Bioinformatics 18:4. doi: 10.1186/s12859-016-1441-7, PMID: PubMed DOI PMC

Zhu C., Wang X., Li J., Jiang R., Chen H., Chen T., et al. . (2022). Determine independent gut microbiota-diseases association by eliminating the effects of human lifestyle factors. BMC Microbiol. 22:4. doi: 10.1186/s12866-021-02414-9, PMID: PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...