CATH: increased structural coverage of functional space
Jazyk angličtina Země Anglie, Velká Británie Médium print
Typ dokumentu časopisecké články, práce podpořená grantem
Grantová podpora
Wellcome Trust - United Kingdom
203780/Z/16/A
Wellcome Trust - United Kingdom
104960/Z/14/Z
Wellcome Trust - United Kingdom
PubMed
33237325
PubMed Central
PMC7778904
DOI
10.1093/nar/gkaa1079
PII: 6006195
Knihovny.cz E-zdroje
- MeSH
- anotace sekvence MeSH
- COVID-19 epidemiologie prevence a kontrola virologie MeSH
- databáze proteinů statistika a číselné údaje MeSH
- epidemie MeSH
- internet MeSH
- lidé MeSH
- proteinové domény * MeSH
- proteiny chemie genetika metabolismus MeSH
- SARS-CoV-2 genetika metabolismus fyziologie MeSH
- sekvence aminokyselin MeSH
- sekvenční analýza proteinů metody MeSH
- sekvenční homologie aminokyselin MeSH
- virové proteiny chemie genetika metabolismus MeSH
- výpočetní biologie metody statistika a číselné údaje MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny MeSH
- virové proteiny MeSH
CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.
Zobrazit více v PubMed
Orengo C., Michie A., Jones S., Jones D., Swindells M., Thornton J.. CATH – a hierarchic classification of protein domain structures. Structure. 1997; 5:1093–1109. PubMed
Pearl F.M.G., Bennett C.F., Bray J.E., Harrison A.P., Martin N., Shepherd A., Sillitoe I., Thornton J., Orengo C.A.. The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 2003; 31:452–455. PubMed PMC
Sillitoe I., Dawson N., Lewis T.E., Das S., Lees J.G., Ashford P., Tolulope A., Scholes H.M., Senatorov I., Bujan A. et al. .. CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res. 2019; 47:D280–D284. PubMed PMC
Lewis T.E., Sillitoe I., Dawson N., Lam S.D., Clarke T., Lee D., Orengo C., Lees J.. Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Res. 2018; 46:D435–D439. PubMed PMC
The UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. PubMed PMC
Yates A.D., Achuthan P., Akanni W., Allen J., Allen J., Alvarez-Jarreta J., Amode M.R., Armean I.M., Azov A.G., Bennett R. et al. .. Ensembl 2020. Nucleic Acids Res. 2019; 47:D745–D751. PubMed PMC
Orengo C.A., Taylor W.R.. SSAP: Sequential structure alignment program for protein structure comparison. Methods in Enzymology. 1996; 266:Elsevier; 617–635. PubMed
Das S., Lee D., Sillitoe I., Dawson N.L., Lees J.G., Orengo C.A.. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics. 2015; 31:3460–3467. PubMed PMC
Katoh K., Standley D.M.. MAFFT multiple sequence alignment software Version 7: improvements in performance and usability. Mol. Biol. Evol. 2013; 30:772–780. PubMed PMC
Mistry J., Finn R.D., Eddy S.R., Bateman A., Punta M.. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013; 41:e121. PubMed PMC
Huntley R.P., Sawford T., Mutowo-Meullenet P., Shypitsyna A., Bonilla C., Martin M.J., O’Donovan C.. The GOA database: Gene Ontology annotation updates for 2015. Nucleic Acids Res. 2015; 43:D1057–D1063. PubMed PMC
Jiang Y., Oron T.R., Clark W.T., Bankapur A.R., D’Andrea D., Lepore R., Funk C.S., Kahanda I., Verspoor K.M., Ben-Hur A. et al. .. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 2016; 17:184. PubMed PMC
Zhou N., Jiang Y., Bergquist T.R., Lee A.J., Kacsoh B.Z., Crocker A.W., Lewis K.A., Georghiou G., Nguyen H.N., Hamid M.N. et al. .. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 2019; 20:244. PubMed PMC
Valdar W.S.J. Scoring residue conservation. Proteins Struct. Funct. Genet. 2002; 48:227–241. PubMed
O’Donoghue S.I., Sabir K.S., Kalemanov M., Stolte C., Wellmann B., Ho V., Roos M., Perdigão N., Buske F.A., Heinrich J. et al. .. Aquaria: simplifying discovery and insight from protein structures. Nat. Methods. 2015; 12:98–99. PubMed
O’Donoghue S.I., Schafferhans A., Sikta N., Stolte C., Kaur S., Ho B.K., Anderson S., Procter J., Dallago C., Bordin N. et al. .. SARS-CoV-2 structural coverage map reveals state changes that disrupt host immunity bioinformatics. 2020; bioRxiv doi:28 September 2020, preprint: not peer reviewed10.1101/2020.07.16.207308. PubMed DOI PMC
Rentzsch R., Orengo C.A.. Protein function prediction using domain families. BMC Bioinformatics. 2013; 14:S5. PubMed PMC
Patani H., Bunney T.D., Thiyagarajan N., Norman R.A., Ogg D., Breed J., Ashford P., Potterton A., Edwards M., Williams S.V. et al. .. Landscape of activating cancer mutations in FGFR kinases and their differential responses to inhibitors in clinical use. Oncotarget. 2016; 7:24252–24268. PubMed PMC
Lewis T.E., Sillitoe I., Lees J.G.. cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly. Bioinformatics. 2019; 35:1766–1767. PubMed PMC
Elbe S., Buckland-Merrett G.. Data, disease and diplomacy: GISAID’s innovative contribution to global health: Data, Disease and Diplomacy. Glob. Chall. 2017; 1:33–46. PubMed PMC
Shu Y., McCauley J.. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro Surveill. Bull. Eur. Sur Mal. Transm. Eur. Commun. Dis. Bull. 2017; 22:30494. PubMed PMC
Gordon D.E., Jang G.M., Bouhaddou M., Xu J., Obernier K., White K.M., O’Meara M.J., Rezelj V.V., Guo J.Z., Swaney D.L. et al. .. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 2020; 583:459–468. PubMed PMC
Ashford P., Pang C.S.M., Moya-García A.A., Adeyelu T., Orengo C.A.. A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations. Sci. Rep. 2019; 9:263. PubMed PMC
Lam S.D., Bordin N., Waman V.P., Scholes H.M., Ashford P., Sen N., van Dorp L., Rauer C., Dawson N.L., Pang C.S.M. et al. .. SARS-CoV-2 spike protein predicted to form complexes with host receptor protein orthologues from a broad range of mammals. Sci. Rep. 2020; 10:16471. PubMed PMC
Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974; 185:862–864. PubMed
PDBImages: a command-line tool for automated macromolecular structure visualization
Machine Learning-Guided Protein Engineering
2DProts: database of family-wide protein secondary structure diagrams