Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics

. 2017 ; 15 () : 48-55. [epub] 20161205

Status PubMed-not-MEDLINE Jazyk angličtina Země Nizozemsko Médium electronic-ecollection

Typ dokumentu přehledy, časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid27980708
Odkazy

PubMed 27980708
PubMed Central PMC5148923
DOI 10.1016/j.csbj.2016.11.005
PII: S2001-0370(16)30067-8
Knihovny.cz E-zdroje

One of main steps in a study of microbial communities is resolving their composition, diversity and function. In the past, these issues were mostly addressed by the use of amplicon sequencing of a target gene because of reasonable price and easier computational postprocessing of the bioinformatic data. With the advancement of sequencing techniques, the main focus shifted to the whole metagenome shotgun sequencing, which allows much more detailed analysis of the metagenomic data, including reconstruction of novel microbial genomes and to gain knowledge about genetic potential and metabolic capacities of whole environments. On the other hand, the output of whole metagenomic shotgun sequencing is mixture of short DNA fragments belonging to various genomes, therefore this approach requires more sophisticated computational algorithms for clustering of related sequences, commonly referred to as sequence binning. There are currently two types of binning methods: taxonomy dependent and taxonomy independent. The first type classifies the DNA fragments by performing a standard homology inference against a reference database, while the latter performs the reference-free binning by applying clustering techniques on features extracted from the sequences. In this review, we describe the strategies within the second approach. Although these strategies do not require prior knowledge, they have higher demands on the length of sequences. Besides their basic principle, an overview of particular methods and tools is provided. Furthermore, the review covers the utilization of the methods in context with the length of sequences and discusses the needs for metagenomic data preprocessing in form of initial assembly prior to binning.

Zobrazit více v PubMed

Kaeberlein T., Lewis K., Epstein S.S. Isolating “uncultivable” microorganisms in pure culture in a simulated natural environment. Science. 2002;296(5570):1127–1129. PubMed

Sleator R.D., Shortall C., Hill C. Metagenomics. Lett Appl Microbiol. 2008;47(5):361–366. PubMed

Reddy T.B.K., Thomas A.D., Stamatis D., Bertsch J., Isbandi M., Jansson J. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res. 2014 PubMed PMC

Rondon M.R., August P.R., Bettermann A.D., Brady S.F., Grossman T.H., Liles Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl Environ Microbiol. 2000;66(6):2541–2547. PubMed PMC

Kennedy J., Marchesi J.R., Dobson A.D.W. Marine metagenomics: strategies for the discovery of novel enzymes with biotechnological applications from marine environments. Microb Cell Fact. 2008;7(1):1–8. PubMed PMC

Cho I., Blaser M.J. The human microbiome: at the interface of health and disease. Nat Rev Genet. April 2012;13(4):260–270. PubMed PMC

Tyson G.W., Chapman J., Hugenholtz P., Allen E.E., Ram R.J., Richardson P.M. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428(6978):37–43. PubMed

Simon C., Rolf D. Metagenomic analyses: past and future trends. Appl Environ Microbiol. 2011;77(4):1153–1161. PubMed PMC

Kurokawa K., Itoh T., Kuwahara T., Oshima K., Toh H., Toyoda A. Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 2007;14(4):169–181. PubMed PMC

Frank J.A., Pan Y., Tooming-Klunderud A., Eijsink V.G.H., Mchardy A.C., Nederbragt A.J. Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data. Sci Rep. 2016;6:25373. PubMed PMC

Aguiar-Pulido V., Huang W., Suarez-Ulloa V., Cickovski T., Mathee K., Narasimhan G. Metagenomics, metatranscriptomics, and metabolomics approaches for microbiome analysis. Evol Bioinforma. 2016;12(S1):5–16. PubMed PMC

Tringe S.G., Von Mering C., Kobayashi A., Salamov A.A., Chen K., Chang H.W. Comparative metagenomics of microbial communities. Science. 2005;308(5721):554–557. PubMed

Bikel S., Valdez-Lara A., Cornejo-Granados F., Rico K., Canizales-Quinteros S., Soberón X. Combining metagenomics, metatranscriptomics and viromics to explore novel microbial interactions: towards a systems-level understanding of human microbiome. Comput Struct Biotechnol J. 2015;13:390–401. PubMed PMC

Scholz M.B., Lo C.-C., Chain P.S.G. Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Curr Opin Biotechnol. 2012;23(1):9–15. PubMed

Langille M.G.I., Zaneveld J., Caporaso J.G., Mcdonald D., Knights D., Reyes J.A. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol. 2013;31(9):814–821. PubMed PMC

Unterseher M., Jumpponen A., Opik M., Tedersoo L., Moora M., Dormann C.F. Species abundance distributions and richness estimations in fungal metagenomics--lessons learned from community ecology. Mol Ecol. 2011;20(2):275–285. PubMed

Ribeca P., Valiente G. Computational challenges of sequence classification in microbiomic data. Brief Bioinform. 2011;12(6):614–626. PubMed

Klindworth A., Pruesse E., Schweer T., Peplies J., Quast C., Horn M. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2012 PubMed PMC

Sedlar K., Videnska P., Skutkova H., Rychlik I., Provaznik I. Bipartite graphs for visualization analysis of microbiome data. Evol Bioinforma. 2016;12(S1):17–23. PubMed PMC

Sharpton T.J. An introduction to the analysis of shotgun metagenomic data. Front Plant Sci. 2014;5:209. PubMed PMC

Reuter J.A., Spacek D.V., Snyder M.P. High-throughput sequencing technologies. Mol Cell. 2015;58(4):586–597. PubMed PMC

Pevzner P.A., Tang H., Waterman M.S. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci. 2001;98(17):9748–9753. [Proceedings of the National Academy of Sciences] PubMed PMC

Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. PubMed

Kent W.J. BLAT—the BLAST-Like Alignment Tool. Genome Res. 2002;12(4):656–664. PubMed PMC

Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):1–10. PubMed PMC

Li H., Durbin R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 2010;26(5):589–595. PubMed PMC

Finn R.D., Coggill P., Eberhardt R.Y., Eddy S.R., Mistry J., Mitchell A.L. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279–D285. PubMed PMC

Vinga S., Almeida J. Alignment-free sequence comparison—a review. Bioinformatics. 2003;19(4):513–523. PubMed

Teeling H., Glöckner F.O. Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective. Brief Bioinform. 2012 PubMed PMC

Sayers E.W., Barrett T., Benson D.A., Bryant S.H., Canese K., Chetvernin V. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009;37(Suppl. 1) PubMed PMC

Mande S.S., Mohammed M.H., Ghosh T.S. Classification of metagenomic sequences: methods and challenges. Brief Bioinform. 2012;13(6):669–681. PubMed

Dick G.J., Andersson A.F., Baker B.J., Simmons S.L., Thomas B.C., Yelton Community-wide analysis of microbial genome sequence signatures. Genome Biol. 2009;10(8):R85. PubMed PMC

Gori F., Mavroedis D., Jetten M.S.M., Marchiori E. 2011 IEEE International Conference on Systems Biology (ISB) 2011. Genomic signatures for metagenomic data analysis: exploiting the reverse complementarity of tetranucleotides; pp. 149–154.

Land M., Hauser L., Jun S.-R., Nookaew I., Leuze M.R. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics. 2015;15(2):141–161. PubMed PMC

Teeling H., Waldmann J., Lombardot T., Bauer M., Glöckner F.O. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinform. 2004;5(1):163. PubMed PMC

Kislyuk A., Bhatnagar S., Dushoff J., Weitz J.S. Unsupervised statistical clustering of environmental shotgun sequences. BMC Bioinform. 2009;10(1):316. PubMed PMC

Kelley D.R., Salzberg S.L. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinform. 2010;11(1):544. PubMed PMC

Chatterji, Sourav, Yamazaki, Ichitaro, Bai, Zhaojun, Eisen, Jonathan A. CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads. Lect Notes Comput Sci. 2008:17–28.

Brown C.T., Sharon I., Thomas B.C., Castelle C.J., Morowitz M.J., Banfield J.F. Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life. Microbiome. 2013;1(1):30. PubMed PMC

Wrighton K.C., Thomas B.C., Sharon I., Miller C.S., Castelle C.J., Verberkmoes N.C. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science. 2012;337(6102):1661–1665. PubMed

Abe T., Hamano Y., Ikemura T. Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes. Biomed Res Int. 2014;2014:1–8. PubMed PMC

Abe T., Sugawara H., Kinouchi M., Kanaya S., Ikemura T. Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 2006;12(5):281–290. PubMed

Kikuchi A., Ikemura T., Abe T. Development of self-compressing BLSOM for comprehensive analysis of big sequence data. Biomed Res Int. 2015;2015:1–8. [Hindawi Publishing Corporation] PubMed PMC

Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–1480.

Laczny C.C., Sternal T., Plugaru V., Gawron P., Atashpendar A., Margossian H. VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome. 2015;3(1):1. PubMed PMC

Van Der Maaten L. Accelerating t-SNE using tree-based algorithms. J Mach Learn Res. 2014;15:3221–3245.

Laczny C.C., Pinel N., Vlassis N., Wilmes P. Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Sci Rep. 2014;4 PubMed PMC

Saeed I., Tang S.-L., Halgamuge S.K. Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition. Nucleic Acids Res. 2011;40(5):e34. PubMed PMC

Saeed I., Halgamuge S.K. The oligonucleotide frequency derived error gradient and its application to the binning of metagenome fragments. BMC Genomics. 2009;10(Suppl. 3):S10. PubMed PMC

Strous M., Kraft B., Bisdorf R., Tegetmeyer H.E. The binning of metagenomic contigs for microbial physiology of mixed cultures. Front Microbiol. 2012;3 PubMed PMC

Wu Y.-W., Ye Y. A novel abundance-based algorithm for binning metagenomic sequences using l -tuples. J Comput Biol. 2011;18(3):523–534. PubMed PMC

Wang Y., Hu H., Li X. MBBC: an efficient approach for metagenomic binning based on clustering. BMC Bioinform. 2015;16(1) PubMed PMC

Nielsen H.B., Almeida M., Juncker A.S., Rasmussen S., Li J., Sunagawa Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol. 2014;32(8):822–828. PubMed

Lander E.S., Waterman M.S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics. 1988;2(3):231–239. PubMed

Kultima J.R., Sunagawa S., Li J., Chen W., Chen H., Mende D.R. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One. 2012;7(10):e47656. PubMed PMC

Alneberg J., Bjarnason B.S., De Bruijn I., Schirmer M., Quick J., Ijaz U.Z. Binning metagenomic contigs by coverage and composition. Nat Methods. 2014;11(11):1144–1146. PubMed

Corduneanu A., Bishop C.M. Variational Bayesian model selection for mixture distributions. Artif Intell Stat. 2001:27–34.

Lu Y.Y., Chen T., Fuhrman J.A., Sun F. COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge. Bioinformatics. 2016:btw290. PubMed

Lin H.-H., Liao Y.-C. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep. 2016;6:24175. PubMed PMC

Wu Y.-W., Tang Y.-H., Tringe S.G., Simmons B.A., Singer S.W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation–maximization algorithm. Microbiome. 2014;2(1):26. PubMed PMC

Kang D.D., Froula J., Egan R., Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3:e1165. PubMed PMC

Wu Y.-W., Simmons B.A., Singer S.W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2015;32(4):605–607. PubMed

Imelfort M., Parks D., Woodcroft B.J., Dennis P., Hugenholtz P., Tyson G.W. GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ. 2014;2:e603. PubMed PMC

Albertsen M., Hugenholtz P., Skarshewski A., Nielsen K.L., Tyson G.W., Nielsen P.H. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31(6):533–538. PubMed

König M. cy3sabiork: a Cytoscape app for visualizing kinetic data from SABIO-RK. F1000Research. 2016;5:1736.

Wang Y., Leung H.C.M., Yiu S.M., Chin F.Y.L. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics. 2012;28(18):i356–i362. PubMed PMC

Wang Y., Leung H.C.M., Yiu S.M., Chin F.Y.L. MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species. J Comput Biol. 2012;19(2):241–249. PubMed

Wang Y., Leung H., Yiu S., Chin F. MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genomics. 2014;15(Suppl. 1):S12. PubMed PMC

Rhoads A., Au K.F. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics. 2015;13(5):278–289. PubMed PMC

Mikheyev A.S., Tin M.M.Y. A first look at the Oxford Nanopore MinION sequencer. Mol Ecol Resour. 2014;14(6):1097–1102. PubMed

Shokralla S., Spall J.L., Gibson J.F., Hajibabaei M. Next-generation sequencing technologies for environmental DNA research. Mol Ecol. 2012;21(8):1794–1805. PubMed

Luo R., Liu B., Xie Y., Li Z., Huang W., Yuan J. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012;1(1):1–6. PubMed PMC

Pevzner P.A., Tang H., Waterman M.S. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci. 2001;98(17):9748–9753. [online] PubMed PMC

Zerbino D.R., Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–829. PubMed PMC

Namiki T., Hachiya T., Tanaka H., Sakakibara Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 2012;40(20):e155. PubMed PMC

Peng Y., Leung H.C.M., Yiu S.M., Chin F.Y.L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–1428. PubMed

Boisvert S., Raymond F., Godzaridis É., Laviolette F., Corbeil J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 2012;13(12):1–13. PubMed PMC

Sharon I., Morowitz M.J., Thomas B.C., Costello E.K., Relman D.A., Banfield J.F. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 2012;23(1):111–120. PubMed PMC

Gisbrecht A., Hammer B., Mokbel B., Sczyrba A. 2013 17th International Conference on Information Visualisation. 2013. Nonlinear dimensionality reduction for cluster identification in metagenomic samples.

Bishop C.M., Svensén M., Williams C.K.I. GTM: the generative topographic mapping. Neural Comput. 1998;10(1):215–234.

Van Der Maaten L., Hinton G.E. Visualizing high-dimensional data using t-SNE. J Mach Learn Res. 2008;9:2579–2605.

Narayanasamy S., Jarosz Y., Muller E., Laczny C., Herold M., Kaysen A. IMP: a pipeline for reproducible metagenomic and metatranscriptomic analyses. bioRxiv. 2016:039263. PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...