gcType: a high-quality type strain genome database for microbial phylogenetic and functional research
Language English Country England, Great Britain Media print
Document type Journal Article, Research Support, Non-U.S. Gov't
PubMed
33119759
PubMed Central
PMC7778895
DOI
10.1093/nar/gkaa957
PII: 5943199
Knihovny.cz E-resources
- MeSH
- Data Analysis MeSH
- Databases, Genetic * MeSH
- Phylogeny * MeSH
- Genome * MeSH
- Prokaryotic Cells metabolism MeSH
- RNA, Ribosomal, 16S genetics MeSH
- Base Sequence MeSH
- Research * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- RNA, Ribosomal, 16S MeSH
Taxonomic and functional research of microorganisms has increasingly relied upon genome-based data and methods. As the depository of the Global Catalogue of Microorganisms (GCM) 10K prokaryotic type strain sequencing project, Global Catalogue of Type Strain (gcType) has published 1049 type strain genomes sequenced by the GCM 10K project which are preserved in global culture collections with a valid published status. Additionally, the information provided through gcType includes >12 000 publicly available type strain genome sequences from GenBank incorporated using quality control criteria and standard data annotation pipelines to form a high-quality reference database. This database integrates type strain sequences with their phenotypic information to facilitate phenotypic and genotypic analyses. Multiple formats of cross-genome searches and interactive interfaces have allowed extensive exploration of the database's resources. In this study, we describe web-based data analysis pipelines for genomic analyses and genome-based taxonomy, which could serve as a one-stop platform for the identification of prokaryotic species. The number of type strain genomes that are published will continue to increase as the GCM 10K project increases its collaboration with culture collections worldwide. Data of this project is shared with the International Nucleotide Sequence Database Collaboration. Access to gcType is free at http://gctype.wdcm.org/.
American Type Culture Collection 10801 University Boulevard Manassas VA 20110 USA
China Center of Industrial Culture Collection Beijing China
China Thailand Joint Laboratory on Microbial Biotechnology Beijing 100190 China
CIAD A C Collection of Aquatic Important Microorganisms AP 711 Mazatlán Sinaloa Mexico
Colección Española de Cultivos Tipo Spain
Computer Network Information Center Chinese Academy of Sciences Beijing 100190 China
Faculty of Pharmaceutical Sciences Chulalongkorn University Bangkok 10330 Thailand
Korean Collection for Type Cultures 181 Ipsin gil Jeongeup si Jeollabuk do 56212 Republic of Korea
Mycology and Bacteriology Systematics Manaaki Whenua Landcare Research Auckland New Zealand
National Collection of Type Cultures UK
National Institute of Genetics Yata Mishima 411 8540 Japan
See more in PubMed
Whitman W.B., Coleman D.C., Wiebe W.J.. Prokaryotes: the unseen majority. Proc. Natl Acad. Sci. U.S.A. 1998; 95:6578–6583. PubMed PMC
Curtis T.P., Sloan W.T., Scannell J.W.. Estimating prokaryotic diversity and its limits. Proc. Natl Acad. Sci. U.S.A. 2002; 99:10494–10499. PubMed PMC
Skerman V.B.D., McGowan V., Sneath P.H.A.. Approved lists of bacterial names. Int. J. Syst. Bacteriol. 1980; 30:225–420.
Tindall B.J., Rosselló-Móra R., Busse H.J., Ludwig W., Kämpfer P.. Notes on the characterization of prokaryote strains for taxonomic purposes. Int. J. Syst. Evol. Microbiol. 2010; 60:249–266. PubMed
Wayne L.G., Brenner D.J., Colwell R.R., Grimont P.A.D., Kandler O., Krichevsky M.I., Krichevsky M.I., Moore L.H., Moore W.E.C., Murray R.G.E. et al. .. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int. J. Syst. Evol. Microbiol. 1987; 37:463–464.
Varghese N.J., Mukherjee S., Ivanova N., Konstantinidis K.T., Mavrommatis K., Kyrpides N.C., Pati A.. Microbial species delineation using whole genome sequences. Nucleic Acids Res. 2015; 43:6761–6771. PubMed PMC
Kim M., Oh H.S., Park S.C., Chun J.. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int. J. Syst. Evol. Microbiol. 2014; 64:346–351. PubMed
Meier-Kolthoff J.P., Auch A.F., Klenk H.-P., Göker M.. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics. 2013; 14:60. PubMed PMC
Chun J., Oren A., Ventosa A., Christensen H., Arahal D.R., da Costa M.S., Rooney A.P., Yi H., Xu X.W., De Meyer S., Trujillo M.E. et al. .. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int. J. Syst. Evol. Microbiol. 2018; 68:461–466. PubMed
Mukherjee S., Seshadri R., Varghese N.J., Eloe-Fadrosh E.A., Meier-Kolthoff J.P., Göker M., Coates R.C., Hadjithomas M., Pavlopoulos G.A., Paez-Espino D. et al. .. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat. Biotechnol. 2017; 35:676–683. PubMed
Wu L., McCluskey K., Desmeth P., Liu S., Hideaki S., Yin Y., Moriya O., Itoh T., Kim C.Y., Lee J.S. et al. .. The global catalogue of microorganisms 10K type strain sequencing project: closing the genomic gaps for the validly published prokaryotic and fungi species. Gigascience. 2018; 7:5. PubMed PMC
Wu L., Ma J.. The Global Catalogue of Microorganisms (GCM) 10K type strain sequencing project: providing services to taxonomists for standard genome sequencing and annotation. Int. J. Syst. Evol. Microbiol. 2019; 69:895–898. PubMed
Galperin M., Makarova K., Wolf Y., Koonin E.. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015; 43:D261–D269. PubMed PMC
Chen I.-M.A., Chu K., Palaniappan K., Pillay M., Ratner A., Huang J., Huntemann M., Varghese N., White J.R., Seshadri R.. IMG/M v. 5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 2019; 47:D666–D677. PubMed PMC
Meier-Kolthoff J.P., Göker M.. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat. Commun. 2019; 10:1–10. PubMed PMC
Reimer L.C., Vetcininova A., Carbasse J.S., Söhngen C., Gleim D., Ebeling C., Overmann J.. BacDive 2019: bacterial phenotypic data for High-throughput biodiversity analysis. Nucleic Acids Res. 2019; 47:D631–D636. PubMed PMC
Parte A.C., Carbasse S., Joaquim M.-K., Jan P., Reimer L.C., Goker M.. List of prokaryotic names with standing in nomenclature (LPSN) moves to the DSMZ. Int. J. Syst. Evol. Microbiol. 2020; doi:10.1099/ijsem.0.004332. PubMed PMC
Federhen S. Type material in the NCBI Taxonomy Database. Nucleic Acids Res. 2015; 43:D1086–D1098. PubMed PMC
Parks D.H., Chuvochina A., Maria. W., David W., Rinke C., Skarshewski A., Chaumeil P.-A., Hugenholtz P.. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 2018; 36:996–1004. PubMed
Sayers E.W., Cavanaugh M., Clark K., Ostell J., Pruitt K.D., Karsch-Mizrachi I.. GenBank. Nucleic Acids Res. 2020; 48:D84–D86. PubMed PMC
Galperin M., Makarova K., Wolf Y., Koonin E.. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015; 43:D261–D269. PubMed PMC
Kanehisa M., Furumichi M., Tanabe M., Sato Y., Morishima K.. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017; 45:D353–D361. PubMed PMC
Jia B., Raphenya A., Alcock B., Waglechner N., Guo P., Tsang K., Lago B., Dave B., Pereira S., Sharma A. et al. .. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res. 2017; 45:D566–D573. PubMed PMC
Buels R., Yao E., Diesh C.M., Hayes R.D., Munoz-Torres M., Helt G., Goodstein D.M., Elsik C.G., Lewis S.E., Stein L., Holmes I.H. et al. .. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016; 17:66. PubMed PMC
Stothard P., Wishart D.S.. Circular genome visualization and exploration using CGView. Bioinformatics. 2005; 21:537–539. PubMed
Parker C.T., Tindall B.J., Garrity G.M.. International code of nomenclature of prokaryotes. Int. J. Syst. Evol. Microbiol. 2019; 69:S1–S111. PubMed
Field D., Garrity G., Gray T., Morrison N., Selengut J., Sterk P., Tatusova T., Thomson N., Allen M.J., Angiuoli S.V. et al. .. The minimum information about a genome sequence (MIGS) specification. Nat. Biotechnol. 2008; 26:541–547. PubMed PMC
Yoon S.H., Ha S.M., Kwon S., Lim J., Kim Y., Seo H., Chun J.. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 2017; 67:1613. PubMed PMC
Quast C., Pruesse E., Yilmaz P., Gerken J., Schweer T., Yarza P., Peplies J., Glöckner F.O.. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2012; 41:D590–D596. PubMed PMC
Cole J.R., Chai B., Farris R.J., Wang Q., Kulam-Syed-Mohideen A.S., McGarrell D.M., Bandela A.M., Cardenas E., Garrity G.M., Tiedje J.M.. The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data. Nucleic Acids Res. 2007; 35:D169–D172. PubMed PMC
Park S.C., Won S.. Evaluation of 16S rRNA databases for taxonomic assignments using a mock community. Genomics Inform. 2018; 16:e24. PubMed PMC
Lagesen K., Hallin P., Rodland E.A., Staerfeldt H.H., Rognes T., Ussery D.W.. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007; 35:3100–3108. PubMed PMC
Nawrocki E.P., Eddy S.R.. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013; 29:2933–2935. PubMed PMC
Kalvari I., Argasinska J., Quinones-Olvera N., Nawrocki E.P., Rivas E., Eddy S.R., Bateman A., Finn R.D., Petrov A.I.. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 2018; 46:D335–D342. PubMed PMC
Chen I.M., Markowitz V.M., Chu K., Anderson I., Mavromatis K., Kyrpides N.C., Ivanova N.N.. Improving microbial genome annotations in an integrated database context. PLoS One. 2013; 8:e54859. PubMed PMC
Koren S.1, Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M.. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017; 27:722–736. PubMed PMC
Kolmogorov M., Yuan J., Lin Y., Pevzner P.A.. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019; 37:540–546. PubMed
Walker B.J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., Cuomo C.A., Zeng Q., Wortman J., Young S.K. et al. .. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014; 9:e112963. PubMed PMC
Bolger A.M., Lohse M., Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; 30:2114–2120. PubMed PMC
Liu Y., Schröder J., Schmidt B.. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics. 2012; 29:308–315. PubMed
Luo R., Liu B., Xie Y., Li Z., Huang W., Yuan J., He G., Chen Y., Pan Q., Liu Y. et al. .. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012; 1:18. PubMed PMC
Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D.. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012; 19:455–477. PubMed PMC
Zerbino D.R., Birney E.. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008; 18:821–829. PubMed PMC
Kajitani R., Toshimoto K., Noguchi H., Toyoda A., Ogura Y., Okuno M., Yabana M., Harada M., Nagayasu E., Maruyama H. et al. .. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014; 24:1384–1395. PubMed PMC
Parks D.H., Imelfort M., Skennerton C.T., Hugenholtz P., Tyson G.W.. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015; 25:1043–1055. PubMed PMC
Edgar R.C. PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics. 2007; 8:18. PubMed PMC
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999; 27:573–580. PubMed PMC
Lowe T.M., Eddy S.R.. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997; 25:955–964. PubMed PMC
Hyatt D., Chen G.L., Locascio P.F., Land M.L., Larimer F.W., Hauser L.J.. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010; 11:119. PubMed PMC
The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 2017; 45:D331–D338. PubMed PMC
Eric W.S., Richa A., Evan E.B., J Rodney B., Kathi C., Karen C., Ryan C., Nicolas K.F., Timothy H. et al. .. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019; 47:D23–D28.D1. PubMed PMC
Boutet E., Lieberherr D., Tognolli M., Schneider M., Bansal P., Bridge A.J., Poux S., Bougueleret L., Xenarios I.. UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view. Methods Mol. Biol. 2016; 1374:23–54. PubMed
Blin K., Shaw S., Steinke K., Villebro R., Ziemert N., Lee S.Y., Medema M.H., Weber T.. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Res. 2019; 47:W81–W87. PubMed PMC
MetaCyc C.R., Billington R., Fulcher C.A., Keseler I.M., Kothari A., Krummenacker M., Latendresse M., Midford P.E., Ong Q., Ong W.K. et al. .. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 2018; 46:D633–D639. PubMed PMC
Urban M., Cuzick A., Rutherford K., Irvine A., Pedro H., Pant R., Sadanadan V., Khamari L., Billal S., Mohanty S. et al. .. PHI-base: a new interface and further additions for the multi-species pathogen-host interactions database. Nucleic Acids Res. 2017; 45:D604–D610. PubMed PMC
Bateman A., Coin L., Durbin R., Finn R.D., Hollich V., Griffiths‐Jones S., Khanna A., Marshall M., Moxon S., Sonnhammer E.L. et al. .. The Pfam protein families database. Nucleic Acids Res. 2004; 32:D138–D141. PubMed PMC
Liu B., Zheng D.D., Jin Q., Chen L.H., Yang J.. VFDB 2019: a comparative pathogenomic platform with an interactive web interface. Nucleic Acids Res. 2019; 47:D687–D692. PubMed PMC
Kim O.S., Cho Y.J., Lee K., Yoon S.H., Kim M., Na H., Park S.C., Jeon Y.S, Lee J.H., Yi H. et al. .. Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int. J. Syst. Evol. Microbiol. 2012; 7:16–21. PubMed
Ondov B.D., Treangen T.J., Melsted P., Mallonee A.B., Bergman N.H., Koren S., Phillippy A.M.. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016; 17:132. PubMed PMC
Richter M., Rosselló-Móra R.. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U.S.A. 2009; 106:19126–19131. PubMed PMC
Jain C., Rodriguez-R L.M., Phillippy A.M., Konstantinidis K.T., Aluru S.. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 2018; 9:doi:10.1038/s41467-018-07641-9. PubMed PMC
Lee I., Kim Y.O., Park S.C., Chun J.. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 2016; 66:1100–1103. PubMed
Katoh K., Toh H.. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 2008; 9:286–298. PubMed
Edgar R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32:1792–1797. PubMed PMC
Kumar S., Stecher G., Li M., Knyaz C., Tamura K.. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018; 35:1547–1549. PubMed PMC
Price M.N., Dehal P.S., Arkin A.P.. FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS One. 2010; 5:e9490. PubMed PMC
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014; 30:1312–1313. PubMed PMC
Eloe-Fadrosh E.A., Paez-Espino D., Jarett J., Dunfield P.F., Hedlund B.P., Dekas A.E., Grasby S.E., Brady A.L., Dong H., Briggs B.R.. Global metagenomic survey reveals a new bacterial candidate phylum in geothermal springs. Nat. Commun. 2016; 7:10476. PubMed PMC
Wu D., Hugenholtz P., Mavromatis K., Pukall R., Dalin E., Ivanova N.N., Kunin V., Goodwin L., Wu M., Tindall B.J. et al. .. A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature. 2009; 462:1056–1060. PubMed PMC