Improved recovery and annotation of genes in metagenomes through the prediction of fungal introns
Status Publisher Jazyk angličtina Země Velká Británie, Anglie Médium print-electronic
Typ dokumentu časopisecké články
Grantová podpora
21-17749S
Grantová Agentura České Republiky
e-INFRA CZ LM2018140
Ministry of Education, Youth and Sports of the Czech Republic
PubMed
37561110
DOI
10.1111/1755-0998.13852
Knihovny.cz E-zdroje
- Klíčová slova
- artificial intelligence, eukaryote, fungi, gene prediction, intron, metagenomics,
- Publikační typ
- časopisecké články MeSH
Metagenomics provides a tool to assess the functional potential of environmental and host-associated microbiomes based on the analysis of environmental DNA: assembly, gene prediction and annotation. While gene prediction is straightforward for most bacterial and archaeal taxa, it has limited applicability in the majority of eukaryotic organisms, including fungi that contain introns in gene coding sequences. As a consequence, eukaryotic genes are underrepresented in metagenomics datasets and our understanding of the contribution of fungi and other eukaryotes to microbiome functioning is limited. Here, we developed a machine intelligence-based algorithm that predicts fungal introns in environmental DNA with reasonable precision and used it to improve the annotation of environmental metagenomes. Intron removal increased the number of predicted genes by up to 9.1% and improved the annotation of several others. The proportion of newly predicted genes increased with the share of eukaryotic genes in the metagenome and-within fungal taxa-increased with the number of introns per gene. Our approach provides a tool named SVMmycointron for improved metagenome annotation, especially of microbiomes with a high proportion of eukaryotes. The scripts described in the paper are made publicly available and can be readily utilized by microbiome researchers analysing metagenomics data.
Department of Computer Science Czech Technical University Prague Praha Czech Republic
Department of Genetics and Microbiology Charles University Praha Czech Republic
Zobrazit více v PubMed
Baldrian, P., Větrovský, T., Lepinay, C., & Kohout, P. (2022). High-throughput sequencing view on the magnitude of global fungal diversity. Fungal Diversity, 114, 539-547. https://doi.org/10.1007/s13225-021-00472-y
Baten, A., Chang, B. C. H., Halgamuge, S. K., & Li, J. (2006). Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics, 7(Suppl 5), S15. https://doi.org/10.1186/1471-2105-7-s5-s15
Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schokopf, B., & Ratsch, G. (2008). Support vector machines and kernels for computational biology. PLoS Computational Biology, 4(10), 10. https://doi.org/10.1371/journal.pcbi.1000173
Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR), 49(2), 1-50.
Brabec, J., & Machlica, L. (2018). Bad practices in evaluation methodology relevant to class-imbalanced problems. ArXiv, 1812.01388. https://doi.org/10.48550/arXiv.1812.01388
Buchfink, B., Reuter, K., & Drost, H.-G. (2021). Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods, 18(4), 366-368. https://doi.org/10.1038/s41592-021-01101-x
Corrêa, F. B., Saraiva, J. P., Stadler, P. F., & da Rocha, U. N. (2020). TerrestrialMetagenomeDB: A public repository of curated and standardized metadata for terrestrial metagenomes. Nucleic Acids Research, 48(D1), D626-D632. https://doi.org/10.1093/nar/gkz994
de Boer, W., Folman, L. B., Summerbell, R. C., & Boddy, L. (2005). Living in a fungal world: Impact of fungi on soil bacterial niche development. FEMS Microbiology Reviews, 29(4), 795-811. https://doi.org/10.1016/j.femsre.2004.11.005
Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, pp. 978-3). Springer.
Frey, K., & Pucker, B. (2020). Animal, fungi, and plant genome Sequences Harbor different non-canonical splice sites. Cell, 9(2), 19. https://doi.org/10.3390/cells9020458
Grau-Bove, X., Torruella, G., Donachie, S., Suga, H., Leonard, G., Richards, T. A., & Ruiz-Trillo, I. (2017). Dynamics of genomic innovation in the unicellular ancestry of animals. eLife, 6, 35. https://doi.org/10.7554/eLife.26036
Grigoriev, I. V., Nikitin, R., Haridas, S., Kuo, A., Ohm, R., Otillar, R., Riley, R., Salamov, A., Zhao, X., Korzeniewski, F., Smirnova, T., Nordberg, H., Dubchak, I., & Shabalov, I. (2014). MycoCosm portal: Gearing up for 1000 fungal genomes. Nucleic Acids Research, 42(D1), D699-D704. https://doi.org/10.1093/nar/gkt1183
Grutzmann, K., Szafranski, K., Pohl, M., Voigt, K., Petzold, A., & Schuster, S. (2014). Fungal alternative splicing is associated with multicellular complexity and virulence: A genome-wide multi-species study. DNA Research, 21(1), 27-39. https://doi.org/10.1093/dnares/dst038
Handelsman, J. (2004). Metagenomics: Application of genomics to uncultured microorganisms. Microbiology and Molecular Biology Reviews, 68(4), 669-685. https://doi.org/10.1128/mmbr.68.4.669-685.2004
Irimia, M., & Roy, S. W. (2014). Origin of spliceosomal introns and alternative splicing. Cold Spring Harbor Perspectives in Biology, 6(6), a016071. https://doi.org/10.1101/cshperspect.a016071
Karin, E. L., Mirdita, M., & Soding, J. (2020). MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome, 8(1), 48. https://doi.org/10.1186/s40168-020-00808-x
Keren, H., Lev-Maor, G., & Ast, G. (2010). Alternative splicing and evolution: Diversification, exon definition and function. Nature Reviews Genetics, 11(5), 345-355. https://doi.org/10.1038/nrg2776
Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics, 5, 59. https://doi.org/10.1186/1471-2105-5-59
Kupfer, D. M., Drabenstot, S. D., Buchanan, K. L., Lai, H. S., Zhu, H., Dyer, D. W., Roe, B. A., & Murphy, J. W. (2004). Introns and splicing elements of five diverse fungi. Eukaryotic Cell, 3(5), 1088-1100. https://doi.org/10.1128/ec.3.5.1088-1100.2004
Leslie, C., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing, 564-575.
Li, Y. N., Steenwyk, J. L., Chang, Y., Wang, Y., James, T. Y., Stajich, J. E., Spatafora, J. W., Groenewald, M., Dunn, C. W., Hittinger, C. T., Shen, X. X., & Rokas, A. (2021). A genome-scale phylogeny of the kingdom fungi. Current Biology, 31(8), 1653-1665.e5. https://doi.org/10.1016/j.cub.2021.01.074
Lim, C. S., Weinstein, B. N., Roy, S. W., & Brown, C. M. (2021). Analysis of fungal genomes reveals commonalities of intron gain or loss and functions in intron-poor species. Molecular Biology and Evolution, 38(10), 4166-4186. https://doi.org/10.1093/molbev/msab094
Loftus, B. J., Fung, E., Roncaglia, P., Rowley, D., Amedeo, P., Bruno, D., Vamathevan, J., Miranda, M., Anderson, I. J., Fraser, J. A., Allen, J. E., Bosdet, I. E., Brent, M. R., Chiu, R., Doering, T. L., Donlin, M. J., D'Souza, C. A., Fox, D. S., Grinberg, V., … Hyman, R. W. (2005). The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science, 307(5713), 1321-1324. https://doi.org/10.1126/science.1103773
Malousi, A., Chouvarda, I., Koutkias, V., Kouidou, S., & Maglaveras, N. (2010). SpliceIT: A hybrid method for splice signal identification based on probabilistic and biological inference. Journal of Biomedical Informatics, 43(2), 208-217. https://doi.org/10.1016/j.jbi.2009.09.004
Martinez, D., Larrondo, L. F., Putnam, N., Gelpke, M. D. S., Huang, K., Chapman, J., Helfenbein, K. G., Ramaiya, P., Detter, J. C., Larimer, F., Coutinho, P. M., Henrissat, B., Berka, R., Cullen, D., & Rokhsar, D. (2004). Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78. Nature Biotechnology, 22(6), 695-700. https://doi.org/10.1038/nbt967
Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J., & Levy Karin, E. (2021). Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18), 3029-3031. https://doi.org/10.1093/bioinformatics/btab184
Nayfach, S., Roux, S., Seshadri, R., Udwary, D., Varghese, N., Schulz, F., Wu, D., Paez-Espino, D., Chen, I. M., Huntemann, M., Palaniappan, K., Ladau, J., Mukherjee, S., Reddy, T. B. K., Nielsen, T., Kirton, E., Faria, J. P., Edirisinghe, J. N., Henry, C. S., … Eloe-Fadrosh, E. A. (2021). A genomic catalog of Earth's microbiomes. Nature Biotechnology, 39(4), 499-509. https://doi.org/10.1038/s41587-020-0718-6
Parks, D. H., Rinke, C., Chuvochina, M., Chaumeil, P. A., Woodcroft, B. J., Evans, P. N., Hugenholtz, P., & Tyson, G. W. (2017). Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature Microbiology, 2(11), 1533-1542. https://doi.org/10.1038/s41564-017-0012-7
Patel, A. A., & Steitz, J. A. (2003). Splicing double: Insights from the second spliceosome. Nature Reviews Molecular Cell Biology, 4(12), 960-970. https://doi.org/10.1038/nrm1259
Rho, M., Tang, H., & Ye, Y. (2010). FragGeneScan: Predicting genes in short and error-prone reads. Nucleic Acids Research, 38(20), e191. https://doi.org/10.1093/nar/gkq747
Sieber, P., Voigt, K., Kammer, P., Brunke, S., Schuster, S., & Linde, J. (2018). Comparative study on alternative splicing in Human fungal pathogens suggests its involvement during host invasion. Frontiers in Microbiology, 9, 13. https://doi.org/10.3389/fmicb.2018.02313
Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., & Ratsch, G. (2007). Accurate splice site prediction using support vector machines. BMC Bioinformatics, 8, 16. https://doi.org/10.1186/1471-2105-8-s10-s7
Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., & Morgenstern, B. (2006). AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Research, 34, W435-W439. https://doi.org/10.1093/nar/gkl200
Starke, R., Mondéjar, R. L., Human, Z. R., Navrátilová, D., Štursová, M., Větrovský, T., Olson, H. M., Orton, D. J., Callister, S. J., Lipton, M. S., Howe, A., McCue, L. A., Pennacchio, C., Grigoriev, I., & Baldrian, P. (2021). Niche differentiation of bacteria and fungi in carbon and nitrogen cycling of different habitats in a temperate coniferous forest: A metaproteomic approach. Soil Biology & Biochemistry, 155, 108170. https://doi.org/10.1016/j.soilbio.2021.108170
Tedersoo, L., Bahram, M., Polme, S., Koljalg, U., Yorou, N. S., Wijesundera, R., Ruiz, L. V., Vasco-Palacios, A. M., Thu, P. Q., Suija, A., Smith, M. E., Sharp, C., Saluveer, E., Saitta, A., Rosas, M., Riit, T., Ratkowsky, D., Pritsch, K., Põldmaa, K., … Abarenkov, K. (2014). Global diversity and geography of soil fungi. Science, 346(6213), 1256688. https://doi.org/10.1126/science.1256688
Tláskal, V., Brabcová, V., Větrovský, T., Jomura, M., López-Mondéjar, R., Oliveira Monteiro, L. M., Saraiva, J. P., Human, Z. R., Cajthaml, T., Nunes da Rocha, U., & Baldrian, P. (2021). Complementary roles of wood-inhabiting fungi and bacteria facilitate deadwood decomposition. mSystems, 6(1), e01078-20. https://doi.org/10.1128/mSystems.01078-20
Tláskal, V., Brabcová, V., Větrovský, T., López-Mondéjar, R., Monteiro, L. M. O., Saraiva, J. P., da Rocha, U. N., & Baldrian, P. (2021). Metagenomes, metatranscriptomes and microbiomes of naturally decomposing deadwood. Scientific Data, 8(1), 198. https://doi.org/10.1038/s41597-021-00987-8
Žifčáková, L., Větrovský, T., Lombard, V., Henrissat, B., Howe, A., & Baldrian, P. (2017). Feed in summer, rest in winter: Microbial carbon utilization in forest topsoil. Microbiome, 5(1), 122. https://doi.org/10.1186/s40168-017-0340-0
RefSeq
PRJNA603240, SRX099567, SRX1686623, SRX1990991, SRX2488989, SRX2575203, SRX2720157, SRX3197864, SRX691280, SRX1557139, SRX1944669, SRX2316877, SRX2538108, SRX2648762, SRX2939063, SRX665338, SRX732059