• This record comes from PubMed

gcType: a high-quality type strain genome database for microbial phylogenetic and functional research

. 2021 Jan 08 ; 49 (D1) : D694-D705.

Language English Country England, Great Britain Media print

Document type Journal Article, Research Support, Non-U.S. Gov't

Taxonomic and functional research of microorganisms has increasingly relied upon genome-based data and methods. As the depository of the Global Catalogue of Microorganisms (GCM) 10K prokaryotic type strain sequencing project, Global Catalogue of Type Strain (gcType) has published 1049 type strain genomes sequenced by the GCM 10K project which are preserved in global culture collections with a valid published status. Additionally, the information provided through gcType includes >12 000 publicly available type strain genome sequences from GenBank incorporated using quality control criteria and standard data annotation pipelines to form a high-quality reference database. This database integrates type strain sequences with their phenotypic information to facilitate phenotypic and genotypic analyses. Multiple formats of cross-genome searches and interactive interfaces have allowed extensive exploration of the database's resources. In this study, we describe web-based data analysis pipelines for genomic analyses and genome-based taxonomy, which could serve as a one-stop platform for the identification of prokaryotic species. The number of type strain genomes that are published will continue to increase as the GCM 10K project increases its collaboration with culture collections worldwide. Data of this project is shared with the International Nucleotide Sequence Database Collaboration. Access to gcType is free at http://gctype.wdcm.org/.

All Russian Collection of Microorganisms G K Skryabin Institute of Biochemistry and Physiology of Microorganisms RAS Pushchino Moscow region 142290 Russia

American Type Culture Collection 10801 University Boulevard Manassas VA 20110 USA

BCCM LMG Bacteria Collection Laboratory of Microbiology Faculty of Sciences Ghent University K L Ledeganckstraat 35 9000 Ghent Belgium

Biodiversity Research Centre Thailand Institute of Scientific and Technological Research 35 M 3 Technopolis Khlong 5 Khlong Luang Pathum Thani 12120 Thailand

China Center for Type Culture Collection College of Life Sciences Wuhan University Wuhan 430072 China

China Center of Industrial Culture Collection Beijing China

China General Microbiological Culture Collection Center Institute of Microbiology Chinese Academy of Sciences Beijing 100101 China

China Thailand Joint Laboratory on Microbial Biotechnology Beijing 100190 China

CIAD A C Collection of Aquatic Important Microorganisms AP 711 Mazatlán Sinaloa Mexico

Colección Española de Cultivos Tipo Spain

Computer Network Information Center Chinese Academy of Sciences Beijing 100190 China

Czech Collection of Microorganisms Masaryk University Kamenice 5 building A25 625 00 Brno Czech Republic

Faculty of Pharmaceutical Sciences Chulalongkorn University Bangkok 10330 Thailand

Japan Collection of Microorganisms Microbe Divion RIKEN BioResource Center Koyadai 3 1 1 Tsukuba Ibaraki 305 0074 Japan

Korean Collection for Type Cultures 181 Ipsin gil Jeongeup si Jeollabuk do 56212 Republic of Korea

Microbial Resource and Big Data Center Institute of Microbiology Chinese Academy of Sciences Beijing 100101 China

Mycology and Bacteriology Systematics Manaaki Whenua Landcare Research Auckland New Zealand

National Collection of Agricultural and Industrial Microorganisms Faculty of Food Science Szent István University H 1118 Budapest Somlói út 14 16 Hungary

National Collection of Type Cultures UK

National Institute of Genetics Yata Mishima 411 8540 Japan

NITE Biological Resource Center National Institute of Technology and Evaluation 2 5 8 Kazusakamatari Kisarazu Chiba 292 0818 Japan

State Key Laboratory of Microbial Resources Institute of Microbiology Chinese Academy of Sciences Beijing 100101 China

Thailand Bioresource Research Center Thailand

World Data Center for Microorganisms Beijing 100101 China

See more in PubMed

Whitman W.B., Coleman D.C., Wiebe W.J.. Prokaryotes: the unseen majority. Proc. Natl Acad. Sci. U.S.A. 1998; 95:6578–6583. PubMed PMC

Curtis T.P., Sloan W.T., Scannell J.W.. Estimating prokaryotic diversity and its limits. Proc. Natl Acad. Sci. U.S.A. 2002; 99:10494–10499. PubMed PMC

Skerman V.B.D., McGowan V., Sneath P.H.A.. Approved lists of bacterial names. Int. J. Syst. Bacteriol. 1980; 30:225–420.

Tindall B.J., Rosselló-Móra R., Busse H.J., Ludwig W., Kämpfer P.. Notes on the characterization of prokaryote strains for taxonomic purposes. Int. J. Syst. Evol. Microbiol. 2010; 60:249–266. PubMed

Wayne L.G., Brenner D.J., Colwell R.R., Grimont P.A.D., Kandler O., Krichevsky M.I., Krichevsky M.I., Moore L.H., Moore W.E.C., Murray R.G.E. et al. .. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int. J. Syst. Evol. Microbiol. 1987; 37:463–464.

Varghese N.J., Mukherjee S., Ivanova N., Konstantinidis K.T., Mavrommatis K., Kyrpides N.C., Pati A.. Microbial species delineation using whole genome sequences. Nucleic Acids Res. 2015; 43:6761–6771. PubMed PMC

Kim M., Oh H.S., Park S.C., Chun J.. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int. J. Syst. Evol. Microbiol. 2014; 64:346–351. PubMed

Meier-Kolthoff J.P., Auch A.F., Klenk H.-P., Göker M.. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics. 2013; 14:60. PubMed PMC

Chun J., Oren A., Ventosa A., Christensen H., Arahal D.R., da Costa M.S., Rooney A.P., Yi H., Xu X.W., De Meyer S., Trujillo M.E. et al. .. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int. J. Syst. Evol. Microbiol. 2018; 68:461–466. PubMed

Mukherjee S., Seshadri R., Varghese N.J., Eloe-Fadrosh E.A., Meier-Kolthoff J.P., Göker M., Coates R.C., Hadjithomas M., Pavlopoulos G.A., Paez-Espino D. et al. .. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat. Biotechnol. 2017; 35:676–683. PubMed

Wu L., McCluskey K., Desmeth P., Liu S., Hideaki S., Yin Y., Moriya O., Itoh T., Kim C.Y., Lee J.S. et al. .. The global catalogue of microorganisms 10K type strain sequencing project: closing the genomic gaps for the validly published prokaryotic and fungi species. Gigascience. 2018; 7:5. PubMed PMC

Wu L., Ma J.. The Global Catalogue of Microorganisms (GCM) 10K type strain sequencing project: providing services to taxonomists for standard genome sequencing and annotation. Int. J. Syst. Evol. Microbiol. 2019; 69:895–898. PubMed

Galperin M., Makarova K., Wolf Y., Koonin E.. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015; 43:D261–D269. PubMed PMC

Chen I.-M.A., Chu K., Palaniappan K., Pillay M., Ratner A., Huang J., Huntemann M., Varghese N., White J.R., Seshadri R.. IMG/M v. 5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res. 2019; 47:D666–D677. PubMed PMC

Meier-Kolthoff J.P., Göker M.. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat. Commun. 2019; 10:1–10. PubMed PMC

Reimer L.C., Vetcininova A., Carbasse J.S., Söhngen C., Gleim D., Ebeling C., Overmann J.. BacDive 2019: bacterial phenotypic data for High-throughput biodiversity analysis. Nucleic Acids Res. 2019; 47:D631–D636. PubMed PMC

Parte A.C., Carbasse S., Joaquim M.-K., Jan P., Reimer L.C., Goker M.. List of prokaryotic names with standing in nomenclature (LPSN) moves to the DSMZ. Int. J. Syst. Evol. Microbiol. 2020; doi:10.1099/ijsem.0.004332. PubMed PMC

Federhen S. Type material in the NCBI Taxonomy Database. Nucleic Acids Res. 2015; 43:D1086–D1098. PubMed PMC

Parks D.H., Chuvochina A., Maria. W., David W., Rinke C., Skarshewski A., Chaumeil P.-A., Hugenholtz P.. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 2018; 36:996–1004. PubMed

Sayers E.W., Cavanaugh M., Clark K., Ostell J., Pruitt K.D., Karsch-Mizrachi I.. GenBank. Nucleic Acids Res. 2020; 48:D84–D86. PubMed PMC

Galperin M., Makarova K., Wolf Y., Koonin E.. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015; 43:D261–D269. PubMed PMC

Kanehisa M., Furumichi M., Tanabe M., Sato Y., Morishima K.. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017; 45:D353–D361. PubMed PMC

Jia B., Raphenya A., Alcock B., Waglechner N., Guo P., Tsang K., Lago B., Dave B., Pereira S., Sharma A. et al. .. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res. 2017; 45:D566–D573. PubMed PMC

Buels R., Yao E., Diesh C.M., Hayes R.D., Munoz-Torres M., Helt G., Goodstein D.M., Elsik C.G., Lewis S.E., Stein L., Holmes I.H. et al. .. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016; 17:66. PubMed PMC

Stothard P., Wishart D.S.. Circular genome visualization and exploration using CGView. Bioinformatics. 2005; 21:537–539. PubMed

Parker C.T., Tindall B.J., Garrity G.M.. International code of nomenclature of prokaryotes. Int. J. Syst. Evol. Microbiol. 2019; 69:S1–S111. PubMed

Field D., Garrity G., Gray T., Morrison N., Selengut J., Sterk P., Tatusova T., Thomson N., Allen M.J., Angiuoli S.V. et al. .. The minimum information about a genome sequence (MIGS) specification. Nat. Biotechnol. 2008; 26:541–547. PubMed PMC

Yoon S.H., Ha S.M., Kwon S., Lim J., Kim Y., Seo H., Chun J.. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 2017; 67:1613. PubMed PMC

Quast C., Pruesse E., Yilmaz P., Gerken J., Schweer T., Yarza P., Peplies J., Glöckner F.O.. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2012; 41:D590–D596. PubMed PMC

Cole J.R., Chai B., Farris R.J., Wang Q., Kulam-Syed-Mohideen A.S., McGarrell D.M., Bandela A.M., Cardenas E., Garrity G.M., Tiedje J.M.. The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data. Nucleic Acids Res. 2007; 35:D169–D172. PubMed PMC

Park S.C., Won S.. Evaluation of 16S rRNA databases for taxonomic assignments using a mock community. Genomics Inform. 2018; 16:e24. PubMed PMC

Lagesen K., Hallin P., Rodland E.A., Staerfeldt H.H., Rognes T., Ussery D.W.. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007; 35:3100–3108. PubMed PMC

Nawrocki E.P., Eddy S.R.. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013; 29:2933–2935. PubMed PMC

Kalvari I., Argasinska J., Quinones-Olvera N., Nawrocki E.P., Rivas E., Eddy S.R., Bateman A., Finn R.D., Petrov A.I.. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 2018; 46:D335–D342. PubMed PMC

Chen I.M., Markowitz V.M., Chu K., Anderson I., Mavromatis K., Kyrpides N.C., Ivanova N.N.. Improving microbial genome annotations in an integrated database context. PLoS One. 2013; 8:e54859. PubMed PMC

Koren S.1, Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M.. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017; 27:722–736. PubMed PMC

Kolmogorov M., Yuan J., Lin Y., Pevzner P.A.. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019; 37:540–546. PubMed

Walker B.J., Abeel T., Shea T., Priest M., Abouelliel A., Sakthikumar S., Cuomo C.A., Zeng Q., Wortman J., Young S.K. et al. .. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014; 9:e112963. PubMed PMC

Bolger A.M., Lohse M., Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; 30:2114–2120. PubMed PMC

Liu Y., Schröder J., Schmidt B.. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. Bioinformatics. 2012; 29:308–315. PubMed

Luo R., Liu B., Xie Y., Li Z., Huang W., Yuan J., He G., Chen Y., Pan Q., Liu Y. et al. .. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012; 1:18. PubMed PMC

Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D.. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012; 19:455–477. PubMed PMC

Zerbino D.R., Birney E.. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008; 18:821–829. PubMed PMC

Kajitani R., Toshimoto K., Noguchi H., Toyoda A., Ogura Y., Okuno M., Yabana M., Harada M., Nagayasu E., Maruyama H. et al. .. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014; 24:1384–1395. PubMed PMC

Parks D.H., Imelfort M., Skennerton C.T., Hugenholtz P., Tyson G.W.. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015; 25:1043–1055. PubMed PMC

Edgar R.C. PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics. 2007; 8:18. PubMed PMC

Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999; 27:573–580. PubMed PMC

Lowe T.M., Eddy S.R.. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997; 25:955–964. PubMed PMC

Hyatt D., Chen G.L., Locascio P.F., Land M.L., Larimer F.W., Hauser L.J.. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010; 11:119. PubMed PMC

The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 2017; 45:D331–D338. PubMed PMC

Eric W.S., Richa A., Evan E.B., J Rodney B., Kathi C., Karen C., Ryan C., Nicolas K.F., Timothy H. et al. .. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019; 47:D23–D28.D1. PubMed PMC

Boutet E., Lieberherr D., Tognolli M., Schneider M., Bansal P., Bridge A.J., Poux S., Bougueleret L., Xenarios I.. UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view. Methods Mol. Biol. 2016; 1374:23–54. PubMed

Blin K., Shaw S., Steinke K., Villebro R., Ziemert N., Lee S.Y., Medema M.H., Weber T.. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Res. 2019; 47:W81–W87. PubMed PMC

MetaCyc C.R., Billington R., Fulcher C.A., Keseler I.M., Kothari A., Krummenacker M., Latendresse M., Midford P.E., Ong Q., Ong W.K. et al. .. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 2018; 46:D633–D639. PubMed PMC

Urban M., Cuzick A., Rutherford K., Irvine A., Pedro H., Pant R., Sadanadan V., Khamari L., Billal S., Mohanty S. et al. .. PHI-base: a new interface and further additions for the multi-species pathogen-host interactions database. Nucleic Acids Res. 2017; 45:D604–D610. PubMed PMC

Bateman A., Coin L., Durbin R., Finn R.D., Hollich V., Griffiths‐Jones S., Khanna A., Marshall M., Moxon S., Sonnhammer E.L. et al. .. The Pfam protein families database. Nucleic Acids Res. 2004; 32:D138–D141. PubMed PMC

Liu B., Zheng D.D., Jin Q., Chen L.H., Yang J.. VFDB 2019: a comparative pathogenomic platform with an interactive web interface. Nucleic Acids Res. 2019; 47:D687–D692. PubMed PMC

Kim O.S., Cho Y.J., Lee K., Yoon S.H., Kim M., Na H., Park S.C., Jeon Y.S, Lee J.H., Yi H. et al. .. Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int. J. Syst. Evol. Microbiol. 2012; 7:16–21. PubMed

Ondov B.D., Treangen T.J., Melsted P., Mallonee A.B., Bergman N.H., Koren S., Phillippy A.M.. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016; 17:132. PubMed PMC

Richter M., Rosselló-Móra R.. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U.S.A. 2009; 106:19126–19131. PubMed PMC

Jain C., Rodriguez-R L.M., Phillippy A.M., Konstantinidis K.T., Aluru S.. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 2018; 9:doi:10.1038/s41467-018-07641-9. PubMed PMC

Lee I., Kim Y.O., Park S.C., Chun J.. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 2016; 66:1100–1103. PubMed

Katoh K., Toh H.. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 2008; 9:286–298. PubMed

Edgar R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32:1792–1797. PubMed PMC

Kumar S., Stecher G., Li M., Knyaz C., Tamura K.. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018; 35:1547–1549. PubMed PMC

Price M.N., Dehal P.S., Arkin A.P.. FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS One. 2010; 5:e9490. PubMed PMC

Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014; 30:1312–1313. PubMed PMC

Eloe-Fadrosh E.A., Paez-Espino D., Jarett J., Dunfield P.F., Hedlund B.P., Dekas A.E., Grasby S.E., Brady A.L., Dong H., Briggs B.R.. Global metagenomic survey reveals a new bacterial candidate phylum in geothermal springs. Nat. Commun. 2016; 7:10476. PubMed PMC

Wu D., Hugenholtz P., Mavromatis K., Pukall R., Dalin E., Ivanova N.N., Kunin V., Goodwin L., Wu M., Tindall B.J. et al. .. A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature. 2009; 462:1056–1060. PubMed PMC

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...