CATH: increased structural coverage of functional space

. 2021 Jan 08 ; 49 (D1) : D266-D273.

Jazyk angličtina Země Anglie, Velká Británie Médium print

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid33237325

Grantová podpora
Wellcome Trust - United Kingdom
203780/Z/16/A Wellcome Trust - United Kingdom
104960/Z/14/Z Wellcome Trust - United Kingdom

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.

Zobrazit více v PubMed

Orengo C., Michie A., Jones S., Jones D., Swindells M., Thornton J.. CATH – a hierarchic classification of protein domain structures. Structure. 1997; 5:1093–1109. PubMed

Pearl F.M.G., Bennett C.F., Bray J.E., Harrison A.P., Martin N., Shepherd A., Sillitoe I., Thornton J., Orengo C.A.. The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 2003; 31:452–455. PubMed PMC

Sillitoe I., Dawson N., Lewis T.E., Das S., Lees J.G., Ashford P., Tolulope A., Scholes H.M., Senatorov I., Bujan A. et al. .. CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res. 2019; 47:D280–D284. PubMed PMC

Lewis T.E., Sillitoe I., Dawson N., Lam S.D., Clarke T., Lee D., Orengo C., Lees J.. Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Res. 2018; 46:D435–D439. PubMed PMC

The UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. PubMed PMC

Yates A.D., Achuthan P., Akanni W., Allen J., Allen J., Alvarez-Jarreta J., Amode M.R., Armean I.M., Azov A.G., Bennett R. et al. .. Ensembl 2020. Nucleic Acids Res. 2019; 47:D745–D751. PubMed PMC

Orengo C.A., Taylor W.R.. SSAP: Sequential structure alignment program for protein structure comparison. Methods in Enzymology. 1996; 266:Elsevier; 617–635. PubMed

Das S., Lee D., Sillitoe I., Dawson N.L., Lees J.G., Orengo C.A.. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics. 2015; 31:3460–3467. PubMed PMC

Katoh K., Standley D.M.. MAFFT multiple sequence alignment software Version 7: improvements in performance and usability. Mol. Biol. Evol. 2013; 30:772–780. PubMed PMC

Mistry J., Finn R.D., Eddy S.R., Bateman A., Punta M.. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013; 41:e121. PubMed PMC

Huntley R.P., Sawford T., Mutowo-Meullenet P., Shypitsyna A., Bonilla C., Martin M.J., O’Donovan C.. The GOA database: Gene Ontology annotation updates for 2015. Nucleic Acids Res. 2015; 43:D1057–D1063. PubMed PMC

Jiang Y., Oron T.R., Clark W.T., Bankapur A.R., D’Andrea D., Lepore R., Funk C.S., Kahanda I., Verspoor K.M., Ben-Hur A. et al. .. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 2016; 17:184. PubMed PMC

Zhou N., Jiang Y., Bergquist T.R., Lee A.J., Kacsoh B.Z., Crocker A.W., Lewis K.A., Georghiou G., Nguyen H.N., Hamid M.N. et al. .. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 2019; 20:244. PubMed PMC

Valdar W.S.J. Scoring residue conservation. Proteins Struct. Funct. Genet. 2002; 48:227–241. PubMed

O’Donoghue S.I., Sabir K.S., Kalemanov M., Stolte C., Wellmann B., Ho V., Roos M., Perdigão N., Buske F.A., Heinrich J. et al. .. Aquaria: simplifying discovery and insight from protein structures. Nat. Methods. 2015; 12:98–99. PubMed

O’Donoghue S.I., Schafferhans A., Sikta N., Stolte C., Kaur S., Ho B.K., Anderson S., Procter J., Dallago C., Bordin N. et al. .. SARS-CoV-2 structural coverage map reveals state changes that disrupt host immunity bioinformatics. 2020; bioRxiv doi:28 September 2020, preprint: not peer reviewed10.1101/2020.07.16.207308. PubMed DOI PMC

Rentzsch R., Orengo C.A.. Protein function prediction using domain families. BMC Bioinformatics. 2013; 14:S5. PubMed PMC

Patani H., Bunney T.D., Thiyagarajan N., Norman R.A., Ogg D., Breed J., Ashford P., Potterton A., Edwards M., Williams S.V. et al. .. Landscape of activating cancer mutations in FGFR kinases and their differential responses to inhibitors in clinical use. Oncotarget. 2016; 7:24252–24268. PubMed PMC

Lewis T.E., Sillitoe I., Lees J.G.. cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly. Bioinformatics. 2019; 35:1766–1767. PubMed PMC

Elbe S., Buckland-Merrett G.. Data, disease and diplomacy: GISAID’s innovative contribution to global health: Data, Disease and Diplomacy. Glob. Chall. 2017; 1:33–46. PubMed PMC

Shu Y., McCauley J.. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro Surveill. Bull. Eur. Sur Mal. Transm. Eur. Commun. Dis. Bull. 2017; 22:30494. PubMed PMC

Gordon D.E., Jang G.M., Bouhaddou M., Xu J., Obernier K., White K.M., O’Meara M.J., Rezelj V.V., Guo J.Z., Swaney D.L. et al. .. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 2020; 583:459–468. PubMed PMC

Ashford P., Pang C.S.M., Moya-García A.A., Adeyelu T., Orengo C.A.. A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations. Sci. Rep. 2019; 9:263. PubMed PMC

Lam S.D., Bordin N., Waman V.P., Scholes H.M., Ashford P., Sen N., van Dorp L., Rauer C., Dawson N.L., Pang C.S.M. et al. .. SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals. Sci. Rep. 2020; 10:16471. PubMed PMC

Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974; 185:862–864. PubMed

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...