Improved recovery and annotation of genes in metagenomes through the prediction of fungal introns

. 2023 Nov ; 23 (8) : 1800-1811. [epub] 20230810

Status Publisher Jazyk angličtina Země Velká Británie, Anglie Médium print-electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid37561110

Grantová podpora
21-17749S Grantová Agentura České Republiky
e-INFRA CZ LM2018140 Ministry of Education, Youth and Sports of the Czech Republic

Metagenomics provides a tool to assess the functional potential of environmental and host-associated microbiomes based on the analysis of environmental DNA: assembly, gene prediction and annotation. While gene prediction is straightforward for most bacterial and archaeal taxa, it has limited applicability in the majority of eukaryotic organisms, including fungi that contain introns in gene coding sequences. As a consequence, eukaryotic genes are underrepresented in metagenomics datasets and our understanding of the contribution of fungi and other eukaryotes to microbiome functioning is limited. Here, we developed a machine intelligence-based algorithm that predicts fungal introns in environmental DNA with reasonable precision and used it to improve the annotation of environmental metagenomes. Intron removal increased the number of predicted genes by up to 9.1% and improved the annotation of several others. The proportion of newly predicted genes increased with the share of eukaryotic genes in the metagenome and-within fungal taxa-increased with the number of introns per gene. Our approach provides a tool named SVMmycointron for improved metagenome annotation, especially of microbiomes with a high proportion of eukaryotes. The scripts described in the paper are made publicly available and can be readily utilized by microbiome researchers analysing metagenomics data.

Zobrazit více v PubMed

Baldrian, P., Větrovský, T., Lepinay, C., & Kohout, P. (2022). High-throughput sequencing view on the magnitude of global fungal diversity. Fungal Diversity, 114, 539-547. https://doi.org/10.1007/s13225-021-00472-y

Baten, A., Chang, B. C. H., Halgamuge, S. K., & Li, J. (2006). Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics, 7(Suppl 5), S15. https://doi.org/10.1186/1471-2105-7-s5-s15

Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schokopf, B., & Ratsch, G. (2008). Support vector machines and kernels for computational biology. PLoS Computational Biology, 4(10), 10. https://doi.org/10.1371/journal.pcbi.1000173

Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR), 49(2), 1-50.

Brabec, J., & Machlica, L. (2018). Bad practices in evaluation methodology relevant to class-imbalanced problems. ArXiv, 1812.01388. https://doi.org/10.48550/arXiv.1812.01388

Buchfink, B., Reuter, K., & Drost, H.-G. (2021). Sensitive protein alignments at tree-of-life scale using DIAMOND. Nature Methods, 18(4), 366-368. https://doi.org/10.1038/s41592-021-01101-x

Corrêa, F. B., Saraiva, J. P., Stadler, P. F., & da Rocha, U. N. (2020). TerrestrialMetagenomeDB: A public repository of curated and standardized metadata for terrestrial metagenomes. Nucleic Acids Research, 48(D1), D626-D632. https://doi.org/10.1093/nar/gkz994

de Boer, W., Folman, L. B., Summerbell, R. C., & Boddy, L. (2005). Living in a fungal world: Impact of fungi on soil bacterial niche development. FEMS Microbiology Reviews, 29(4), 795-811. https://doi.org/10.1016/j.femsre.2004.11.005

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, pp. 978-3). Springer.

Frey, K., & Pucker, B. (2020). Animal, fungi, and plant genome Sequences Harbor different non-canonical splice sites. Cell, 9(2), 19. https://doi.org/10.3390/cells9020458

Grau-Bove, X., Torruella, G., Donachie, S., Suga, H., Leonard, G., Richards, T. A., & Ruiz-Trillo, I. (2017). Dynamics of genomic innovation in the unicellular ancestry of animals. eLife, 6, 35. https://doi.org/10.7554/eLife.26036

Grigoriev, I. V., Nikitin, R., Haridas, S., Kuo, A., Ohm, R., Otillar, R., Riley, R., Salamov, A., Zhao, X., Korzeniewski, F., Smirnova, T., Nordberg, H., Dubchak, I., & Shabalov, I. (2014). MycoCosm portal: Gearing up for 1000 fungal genomes. Nucleic Acids Research, 42(D1), D699-D704. https://doi.org/10.1093/nar/gkt1183

Grutzmann, K., Szafranski, K., Pohl, M., Voigt, K., Petzold, A., & Schuster, S. (2014). Fungal alternative splicing is associated with multicellular complexity and virulence: A genome-wide multi-species study. DNA Research, 21(1), 27-39. https://doi.org/10.1093/dnares/dst038

Handelsman, J. (2004). Metagenomics: Application of genomics to uncultured microorganisms. Microbiology and Molecular Biology Reviews, 68(4), 669-685. https://doi.org/10.1128/mmbr.68.4.669-685.2004

Irimia, M., & Roy, S. W. (2014). Origin of spliceosomal introns and alternative splicing. Cold Spring Harbor Perspectives in Biology, 6(6), a016071. https://doi.org/10.1101/cshperspect.a016071

Karin, E. L., Mirdita, M., & Soding, J. (2020). MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome, 8(1), 48. https://doi.org/10.1186/s40168-020-00808-x

Keren, H., Lev-Maor, G., & Ast, G. (2010). Alternative splicing and evolution: Diversification, exon definition and function. Nature Reviews Genetics, 11(5), 345-355. https://doi.org/10.1038/nrg2776

Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics, 5, 59. https://doi.org/10.1186/1471-2105-5-59

Kupfer, D. M., Drabenstot, S. D., Buchanan, K. L., Lai, H. S., Zhu, H., Dyer, D. W., Roe, B. A., & Murphy, J. W. (2004). Introns and splicing elements of five diverse fungi. Eukaryotic Cell, 3(5), 1088-1100. https://doi.org/10.1128/ec.3.5.1088-1100.2004

Leslie, C., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing, 564-575.

Li, Y. N., Steenwyk, J. L., Chang, Y., Wang, Y., James, T. Y., Stajich, J. E., Spatafora, J. W., Groenewald, M., Dunn, C. W., Hittinger, C. T., Shen, X. X., & Rokas, A. (2021). A genome-scale phylogeny of the kingdom fungi. Current Biology, 31(8), 1653-1665.e5. https://doi.org/10.1016/j.cub.2021.01.074

Lim, C. S., Weinstein, B. N., Roy, S. W., & Brown, C. M. (2021). Analysis of fungal genomes reveals commonalities of intron gain or loss and functions in intron-poor species. Molecular Biology and Evolution, 38(10), 4166-4186. https://doi.org/10.1093/molbev/msab094

Loftus, B. J., Fung, E., Roncaglia, P., Rowley, D., Amedeo, P., Bruno, D., Vamathevan, J., Miranda, M., Anderson, I. J., Fraser, J. A., Allen, J. E., Bosdet, I. E., Brent, M. R., Chiu, R., Doering, T. L., Donlin, M. J., D'Souza, C. A., Fox, D. S., Grinberg, V., … Hyman, R. W. (2005). The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science, 307(5713), 1321-1324. https://doi.org/10.1126/science.1103773

Malousi, A., Chouvarda, I., Koutkias, V., Kouidou, S., & Maglaveras, N. (2010). SpliceIT: A hybrid method for splice signal identification based on probabilistic and biological inference. Journal of Biomedical Informatics, 43(2), 208-217. https://doi.org/10.1016/j.jbi.2009.09.004

Martinez, D., Larrondo, L. F., Putnam, N., Gelpke, M. D. S., Huang, K., Chapman, J., Helfenbein, K. G., Ramaiya, P., Detter, J. C., Larimer, F., Coutinho, P. M., Henrissat, B., Berka, R., Cullen, D., & Rokhsar, D. (2004). Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78. Nature Biotechnology, 22(6), 695-700. https://doi.org/10.1038/nbt967

Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J., & Levy Karin, E. (2021). Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18), 3029-3031. https://doi.org/10.1093/bioinformatics/btab184

Nayfach, S., Roux, S., Seshadri, R., Udwary, D., Varghese, N., Schulz, F., Wu, D., Paez-Espino, D., Chen, I. M., Huntemann, M., Palaniappan, K., Ladau, J., Mukherjee, S., Reddy, T. B. K., Nielsen, T., Kirton, E., Faria, J. P., Edirisinghe, J. N., Henry, C. S., … Eloe-Fadrosh, E. A. (2021). A genomic catalog of Earth's microbiomes. Nature Biotechnology, 39(4), 499-509. https://doi.org/10.1038/s41587-020-0718-6

Parks, D. H., Rinke, C., Chuvochina, M., Chaumeil, P. A., Woodcroft, B. J., Evans, P. N., Hugenholtz, P., & Tyson, G. W. (2017). Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature Microbiology, 2(11), 1533-1542. https://doi.org/10.1038/s41564-017-0012-7

Patel, A. A., & Steitz, J. A. (2003). Splicing double: Insights from the second spliceosome. Nature Reviews Molecular Cell Biology, 4(12), 960-970. https://doi.org/10.1038/nrm1259

Rho, M., Tang, H., & Ye, Y. (2010). FragGeneScan: Predicting genes in short and error-prone reads. Nucleic Acids Research, 38(20), e191. https://doi.org/10.1093/nar/gkq747

Sieber, P., Voigt, K., Kammer, P., Brunke, S., Schuster, S., & Linde, J. (2018). Comparative study on alternative splicing in Human fungal pathogens suggests its involvement during host invasion. Frontiers in Microbiology, 9, 13. https://doi.org/10.3389/fmicb.2018.02313

Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., & Ratsch, G. (2007). Accurate splice site prediction using support vector machines. BMC Bioinformatics, 8, 16. https://doi.org/10.1186/1471-2105-8-s10-s7

Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., & Morgenstern, B. (2006). AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Research, 34, W435-W439. https://doi.org/10.1093/nar/gkl200

Starke, R., Mondéjar, R. L., Human, Z. R., Navrátilová, D., Štursová, M., Větrovský, T., Olson, H. M., Orton, D. J., Callister, S. J., Lipton, M. S., Howe, A., McCue, L. A., Pennacchio, C., Grigoriev, I., & Baldrian, P. (2021). Niche differentiation of bacteria and fungi in carbon and nitrogen cycling of different habitats in a temperate coniferous forest: A metaproteomic approach. Soil Biology & Biochemistry, 155, 108170. https://doi.org/10.1016/j.soilbio.2021.108170

Tedersoo, L., Bahram, M., Polme, S., Koljalg, U., Yorou, N. S., Wijesundera, R., Ruiz, L. V., Vasco-Palacios, A. M., Thu, P. Q., Suija, A., Smith, M. E., Sharp, C., Saluveer, E., Saitta, A., Rosas, M., Riit, T., Ratkowsky, D., Pritsch, K., Põldmaa, K., … Abarenkov, K. (2014). Global diversity and geography of soil fungi. Science, 346(6213), 1256688. https://doi.org/10.1126/science.1256688

Tláskal, V., Brabcová, V., Větrovský, T., Jomura, M., López-Mondéjar, R., Oliveira Monteiro, L. M., Saraiva, J. P., Human, Z. R., Cajthaml, T., Nunes da Rocha, U., & Baldrian, P. (2021). Complementary roles of wood-inhabiting fungi and bacteria facilitate deadwood decomposition. mSystems, 6(1), e01078-20. https://doi.org/10.1128/mSystems.01078-20

Tláskal, V., Brabcová, V., Větrovský, T., López-Mondéjar, R., Monteiro, L. M. O., Saraiva, J. P., da Rocha, U. N., & Baldrian, P. (2021). Metagenomes, metatranscriptomes and microbiomes of naturally decomposing deadwood. Scientific Data, 8(1), 198. https://doi.org/10.1038/s41597-021-00987-8

Žifčáková, L., Větrovský, T., Lombard, V., Henrissat, B., Howe, A., & Baldrian, P. (2017). Feed in summer, rest in winter: Microbial carbon utilization in forest topsoil. Microbiome, 5(1), 122. https://doi.org/10.1186/s40168-017-0340-0

Zobrazit více v PubMed

RefSeq
PRJNA603240, SRX099567, SRX1686623, SRX1990991, SRX2488989, SRX2575203, SRX2720157, SRX3197864, SRX691280, SRX1557139, SRX1944669, SRX2316877, SRX2538108, SRX2648762, SRX2939063, SRX665338, SRX732059

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...