Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics
Status PubMed-not-MEDLINE Jazyk angličtina Země Nizozemsko Médium electronic-ecollection
Typ dokumentu přehledy, časopisecké články
PubMed
27980708
PubMed Central
PMC5148923
DOI
10.1016/j.csbj.2016.11.005
PII: S2001-0370(16)30067-8
Knihovny.cz E-zdroje
- Klíčová slova
- Abundance, Genomic signature, Metagenomics, Sequence binning, Taxonomy independent, Visualization,
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
One of main steps in a study of microbial communities is resolving their composition, diversity and function. In the past, these issues were mostly addressed by the use of amplicon sequencing of a target gene because of reasonable price and easier computational postprocessing of the bioinformatic data. With the advancement of sequencing techniques, the main focus shifted to the whole metagenome shotgun sequencing, which allows much more detailed analysis of the metagenomic data, including reconstruction of novel microbial genomes and to gain knowledge about genetic potential and metabolic capacities of whole environments. On the other hand, the output of whole metagenomic shotgun sequencing is mixture of short DNA fragments belonging to various genomes, therefore this approach requires more sophisticated computational algorithms for clustering of related sequences, commonly referred to as sequence binning. There are currently two types of binning methods: taxonomy dependent and taxonomy independent. The first type classifies the DNA fragments by performing a standard homology inference against a reference database, while the latter performs the reference-free binning by applying clustering techniques on features extracted from the sequences. In this review, we describe the strategies within the second approach. Although these strategies do not require prior knowledge, they have higher demands on the length of sequences. Besides their basic principle, an overview of particular methods and tools is provided. Furthermore, the review covers the utilization of the methods in context with the length of sequences and discusses the needs for metagenomic data preprocessing in form of initial assembly prior to binning.
Zobrazit více v PubMed
Kaeberlein T., Lewis K., Epstein S.S. Isolating “uncultivable” microorganisms in pure culture in a simulated natural environment. Science. 2002;296(5570):1127–1129. PubMed
Sleator R.D., Shortall C., Hill C. Metagenomics. Lett Appl Microbiol. 2008;47(5):361–366. PubMed
Reddy T.B.K., Thomas A.D., Stamatis D., Bertsch J., Isbandi M., Jansson J. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res. 2014 PubMed PMC
Rondon M.R., August P.R., Bettermann A.D., Brady S.F., Grossman T.H., Liles Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl Environ Microbiol. 2000;66(6):2541–2547. PubMed PMC
Kennedy J., Marchesi J.R., Dobson A.D.W. Marine metagenomics: strategies for the discovery of novel enzymes with biotechnological applications from marine environments. Microb Cell Fact. 2008;7(1):1–8. PubMed PMC
Cho I., Blaser M.J. The human microbiome: at the interface of health and disease. Nat Rev Genet. April 2012;13(4):260–270. PubMed PMC
Tyson G.W., Chapman J., Hugenholtz P., Allen E.E., Ram R.J., Richardson P.M. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428(6978):37–43. PubMed
Simon C., Rolf D. Metagenomic analyses: past and future trends. Appl Environ Microbiol. 2011;77(4):1153–1161. PubMed PMC
Kurokawa K., Itoh T., Kuwahara T., Oshima K., Toh H., Toyoda A. Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 2007;14(4):169–181. PubMed PMC
Frank J.A., Pan Y., Tooming-Klunderud A., Eijsink V.G.H., Mchardy A.C., Nederbragt A.J. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci Rep. 2016;6:25373. PubMed PMC
Aguiar-Pulido V., Huang W., Suarez-Ulloa V., Cickovski T., Mathee K., Narasimhan G. Metagenomics, metatranscriptomics, and metabolomics approaches for microbiome analysis. Evol Bioinforma. 2016;12(S1):5–16. PubMed PMC
Tringe S.G., Von Mering C., Kobayashi A., Salamov A.A., Chen K., Chang H.W. Comparative metagenomics of microbial communities. Science. 2005;308(5721):554–557. PubMed
Bikel S., Valdez-Lara A., Cornejo-Granados F., Rico K., Canizales-Quinteros S., Soberón X. Combining metagenomics, metatranscriptomics and viromics to explore novel microbial interactions: towards a systems-level understanding of human microbiome. Comput Struct Biotechnol J. 2015;13:390–401. PubMed PMC
Scholz M.B., Lo C.-C., Chain P.S.G. Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Curr Opin Biotechnol. 2012;23(1):9–15. PubMed
Langille M.G.I., Zaneveld J., Caporaso J.G., Mcdonald D., Knights D., Reyes J.A. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol. 2013;31(9):814–821. PubMed PMC
Unterseher M., Jumpponen A., Opik M., Tedersoo L., Moora M., Dormann C.F. Species abundance distributions and richness estimations in fungal metagenomics--lessons learned from community ecology. Mol Ecol. 2011;20(2):275–285. PubMed
Ribeca P., Valiente G. Computational challenges of sequence classification in microbiomic data. Brief Bioinform. 2011;12(6):614–626. PubMed
Klindworth A., Pruesse E., Schweer T., Peplies J., Quast C., Horn M. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2012 PubMed PMC
Sedlar K., Videnska P., Skutkova H., Rychlik I., Provaznik I. Bipartite graphs for visualization analysis of microbiome data. Evol Bioinforma. 2016;12(S1):17–23. PubMed PMC
Sharpton T.J. An introduction to the analysis of shotgun metagenomic data. Front Plant Sci. 2014;5:209. PubMed PMC
Reuter J.A., Spacek D.V., Snyder M.P. High-throughput sequencing technologies. Mol Cell. 2015;58(4):586–597. PubMed PMC
Pevzner P.A., Tang H., Waterman M.S. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci. 2001;98(17):9748–9753. [Proceedings of the National Academy of Sciences] PubMed PMC
Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. PubMed
Kent W.J. BLAT—the BLAST-Like Alignment Tool. Genome Res. 2002;12(4):656–664. PubMed PMC
Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):1–10. PubMed PMC
Li H., Durbin R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 2010;26(5):589–595. PubMed PMC
Finn R.D., Coggill P., Eberhardt R.Y., Eddy S.R., Mistry J., Mitchell A.L. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279–D285. PubMed PMC
Vinga S., Almeida J. Alignment-free sequence comparison—a review. Bioinformatics. 2003;19(4):513–523. PubMed
Teeling H., Glöckner F.O. Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective. Brief Bioinform. 2012 PubMed PMC
Sayers E.W., Barrett T., Benson D.A., Bryant S.H., Canese K., Chetvernin V. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009;37(Suppl. 1) PubMed PMC
Mande S.S., Mohammed M.H., Ghosh T.S. Classification of metagenomic sequences: methods and challenges. Brief Bioinform. 2012;13(6):669–681. PubMed
Dick G.J., Andersson A.F., Baker B.J., Simmons S.L., Thomas B.C., Yelton Community-wide analysis of microbial genome sequence signatures. Genome Biol. 2009;10(8):R85. PubMed PMC
Gori F., Mavroedis D., Jetten M.S.M., Marchiori E. 2011 IEEE International Conference on Systems Biology (ISB) 2011. Genomic signatures for metagenomic data analysis: exploiting the reverse complementarity of tetranucleotides; pp. 149–154.
Land M., Hauser L., Jun S.-R., Nookaew I., Leuze M.R. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics. 2015;15(2):141–161. PubMed PMC
Teeling H., Waldmann J., Lombardot T., Bauer M., Glöckner F.O. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinform. 2004;5(1):163. PubMed PMC
Kislyuk A., Bhatnagar S., Dushoff J., Weitz J.S. Unsupervised statistical clustering of environmental shotgun sequences. BMC Bioinform. 2009;10(1):316. PubMed PMC
Kelley D.R., Salzberg S.L. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinform. 2010;11(1):544. PubMed PMC
Chatterji, Sourav, Yamazaki, Ichitaro, Bai, Zhaojun, Eisen, Jonathan A. CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads. Lect Notes Comput Sci. 2008:17–28.
Brown C.T., Sharon I., Thomas B.C., Castelle C.J., Morowitz M.J., Banfield J.F. Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life. Microbiome. 2013;1(1):30. PubMed PMC
Wrighton K.C., Thomas B.C., Sharon I., Miller C.S., Castelle C.J., Verberkmoes N.C. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science. 2012;337(6102):1661–1665. PubMed
Abe T., Hamano Y., Ikemura T. Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes. Biomed Res Int. 2014;2014:1–8. PubMed PMC
Abe T., Sugawara H., Kinouchi M., Kanaya S., Ikemura T. Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 2006;12(5):281–290. PubMed
Kikuchi A., Ikemura T., Abe T. Development of self-compressing BLSOM for comprehensive analysis of big sequence data. Biomed Res Int. 2015;2015:1–8. [Hindawi Publishing Corporation] PubMed PMC
Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–1480.
Laczny C.C., Sternal T., Plugaru V., Gawron P., Atashpendar A., Margossian H. VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome. 2015;3(1):1. PubMed PMC
Van Der Maaten L. Accelerating t-SNE using tree-based algorithms. J Mach Learn Res. 2014;15:3221–3245.
Laczny C.C., Pinel N., Vlassis N., Wilmes P. Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Sci Rep. 2014;4 PubMed PMC
Saeed I., Tang S.-L., Halgamuge S.K. Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition. Nucleic Acids Res. 2011;40(5):e34. PubMed PMC
Saeed I., Halgamuge S.K. The oligonucleotide frequency derived error gradient and its application to the binning of metagenome fragments. BMC Genomics. 2009;10(Suppl. 3):S10. PubMed PMC
Strous M., Kraft B., Bisdorf R., Tegetmeyer H.E. The binning of metagenomic contigs for microbial physiology of mixed cultures. Front Microbiol. 2012;3 PubMed PMC
Wu Y.-W., Ye Y. A novel abundance-based algorithm for binning metagenomic sequences using l -tuples. J Comput Biol. 2011;18(3):523–534. PubMed PMC
Wang Y., Hu H., Li X. MBBC: an efficient approach for metagenomic binning based on clustering. BMC Bioinform. 2015;16(1) PubMed PMC
Nielsen H.B., Almeida M., Juncker A.S., Rasmussen S., Li J., Sunagawa Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol. 2014;32(8):822–828. PubMed
Lander E.S., Waterman M.S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics. 1988;2(3):231–239. PubMed
Kultima J.R., Sunagawa S., Li J., Chen W., Chen H., Mende D.R. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One. 2012;7(10):e47656. PubMed PMC
Alneberg J., Bjarnason B.S., De Bruijn I., Schirmer M., Quick J., Ijaz U.Z. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11(11):1144–1146. PubMed
Corduneanu A., Bishop C.M. Variational Bayesian model selection for mixture distributions. Artif Intell Stat. 2001:27–34.
Lu Y.Y., Chen T., Fuhrman J.A., Sun F. COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge. Bioinformatics. 2016:btw290. PubMed
Lin H.-H., Liao Y.-C. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep. 2016;6:24175. PubMed PMC
Wu Y.-W., Tang Y.-H., Tringe S.G., Simmons B.A., Singer S.W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation–maximization algorithm. Microbiome. 2014;2(1):26. PubMed PMC
Kang D.D., Froula J., Egan R., Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3:e1165. PubMed PMC
Wu Y.-W., Simmons B.A., Singer S.W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2015;32(4):605–607. PubMed
Imelfort M., Parks D., Woodcroft B.J., Dennis P., Hugenholtz P., Tyson G.W. GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ. 2014;2:e603. PubMed PMC
Albertsen M., Hugenholtz P., Skarshewski A., Nielsen K.L., Tyson G.W., Nielsen P.H. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31(6):533–538. PubMed
König M. cy3sabiork: a Cytoscape app for visualizing kinetic data from SABIO-RK. F1000Research. 2016;5:1736.
Wang Y., Leung H.C.M., Yiu S.M., Chin F.Y.L. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics. 2012;28(18):i356–i362. PubMed PMC
Wang Y., Leung H.C.M., Yiu S.M., Chin F.Y.L. MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species. J Comput Biol. 2012;19(2):241–249. PubMed
Wang Y., Leung H., Yiu S., Chin F. MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genomics. 2014;15(Suppl. 1):S12. PubMed PMC
Rhoads A., Au K.F. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics. 2015;13(5):278–289. PubMed PMC
Mikheyev A.S., Tin M.M.Y. A first look at the Oxford Nanopore MinION sequencer. Mol Ecol Resour. 2014;14(6):1097–1102. PubMed
Shokralla S., Spall J.L., Gibson J.F., Hajibabaei M. Next-generation sequencing technologies for environmental DNA research. Mol Ecol. 2012;21(8):1794–1805. PubMed
Luo R., Liu B., Xie Y., Li Z., Huang W., Yuan J. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012;1(1):1–6. PubMed PMC
Pevzner P.A., Tang H., Waterman M.S. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci. 2001;98(17):9748–9753. [online] PubMed PMC
Zerbino D.R., Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–829. PubMed PMC
Namiki T., Hachiya T., Tanaka H., Sakakibara Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 2012;40(20):e155. PubMed PMC
Peng Y., Leung H.C.M., Yiu S.M., Chin F.Y.L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–1428. PubMed
Boisvert S., Raymond F., Godzaridis É., Laviolette F., Corbeil J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 2012;13(12):1–13. PubMed PMC
Sharon I., Morowitz M.J., Thomas B.C., Costello E.K., Relman D.A., Banfield J.F. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 2012;23(1):111–120. PubMed PMC
Gisbrecht A., Hammer B., Mokbel B., Sczyrba A. 2013 17th International Conference on Information Visualisation. 2013. Nonlinear dimensionality reduction for cluster identification in metagenomic samples.
Bishop C.M., Svensén M., Williams C.K.I. GTM: the generative topographic mapping. Neural Comput. 1998;10(1):215–234.
Van Der Maaten L., Hinton G.E. Visualizing high-dimensional data using t-SNE. J Mach Learn Res. 2008;9:2579–2605.
Narayanasamy S., Jarosz Y., Muller E., Laczny C., Herold M., Kaysen A. IMP: a pipeline for reproducible metagenomic and metatranscriptomic analyses. bioRxiv. 2016:039263. PubMed PMC