Genomics 2 Proteins portal: a resource and discovery tool for linking genetic screening outputs to protein sequences and structures
Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články
Grantová podpora
RM1 HG010461
NHGRI NIH HHS - United States
UM1 HG011969
NHGRI NIH HHS - United States
UM1HG011969
U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
RM1HG010461
U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
PubMed
39294369
PubMed Central
PMC11466821
DOI
10.1038/s41592-024-02409-0
PII: 10.1038/s41592-024-02409-0
Knihovny.cz E-zdroje
- MeSH
- databáze proteinů * MeSH
- genetická variace MeSH
- genetické testování metody MeSH
- genomika * metody MeSH
- konformace proteinů MeSH
- lidé MeSH
- proteiny genetika chemie MeSH
- proteom genetika MeSH
- sekvence aminokyselin MeSH
- software MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- proteiny MeSH
- proteom MeSH
Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics have generated genetic variants at an unprecedented scale. However, efficient tools and resources are needed to link disparate data types-to 'map' variants onto protein structures, to better understand how the variation causes disease, and thereby design therapeutics. Here we present the Genomics 2 Proteins portal ( https://g2p.broadinstitute.org/ ): a human proteome-wide resource that maps 20,076,998 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the Genomics 2 Proteins portal allows users to interactively upload protein residue-wise annotations (for example, variants and scores) as well as the protein structure beyond databases to establish the connection between genomics to proteins. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotypes.
Analytic and Translational Genetics Unit Massachusetts General Hospital Boston MA USA
Cancer Data Sciences Dana Farber Harvard Cancer Center Boston MA USA
Center for the Development of Therapeutics Broad Institute of MIT and Harvard Cambridge MA USA
Department of Medical Biology University of Melbourne Parkville Victoria Australia
Luxembourg Centre for Systems Biomedicine University of Luxembourg Esch sur Alzette Luxembourg
PATTERN Broad Institute of MIT and Harvard Cambridge MA USA
Program in Medical and Population Genetics Broad Institute of MIT and Harvard Cambridge MA USA
Zobrazit více v PubMed
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature596, 583–589 (2021). PubMed PMC
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science373, 871–876 (2021). PubMed PMC
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science384, eadl2528 (2024). PubMed
Lin, Z. M. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379, 1123–1130 (2023). PubMed
Hekkelman, M. L., Vries, I. D., Joosten, R. P. & Perrakis, A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat. Methods20, 205–213 (2023). PubMed PMC
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res.28, 235–242 (2000). PubMed PMC
Burley, S. K. et al. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res.47, gky949 (2018). PubMed PMC
Patwardhan, A. et al. Data management challenges in three-dimensional EM. Nat. Struct. Mol. Biol.19, 1203–1207 (2012). PubMed PMC
Gudmundsson, S. et al. Variant interpretation using population databases: lessons from gnomAD. Hum. Mutat.43, 1012–1030 (2022). PubMed PMC
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res.46, gkx1153 (2017). PubMed PMC
Stenson, P. D. et al. The Human Gene Mutation Database (HGMD): optimizing its use in a clinical diagnostic or research setting. Hum. Genet.139, 1197–1207 (2020). PubMed PMC
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature581, 434–443 (2020). PubMed PMC
Turner, T. N. et al. denovo-db: a compendium of human de novo variants. Nucleic Acids Res.45, D804–D811 (2017). PubMed PMC
Porto, E. M., Komor, A. C., Slaymaker, I. M. & Yeo, G. W. Base editing: advances and therapeutic opportunities. Nat. Rev. Drug Discov.19, 839–859 (2020). PubMed PMC
Lue, N. Z. et al. Base editor scanning charts the DNMT3A activity landscape. Nat. Chem. Biol.19, 176–186 (2023). PubMed PMC
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature576, 149–157 (2019). PubMed PMC
Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell167, 1853–1866 (2016). PubMed PMC
Andreadis, A., Gallego, M. E. & Nadal-Ginard, B. Generation of protein isoform diversity by alternative splicing: mechanistic and biological implications. Annu. Rev. Cell Biol.3, 207–242 (1987). PubMed
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res.29, 308–311 (2001). PubMed PMC
den Dunnen, J. T. Describing sequence variants using HGVS nomenclature. in Genotyping: Methods and Protocols (eds White S. J. & Cantsilieris S.) 243–251 (Springer New York, 2017). PubMed
Apweiler, R. et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res.32, D115–D119 (2004). PubMed PMC
Seal, R. L. et al. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res.51, D1003–D1009 (2022). PubMed PMC
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res.30, 38–41 (2002). PubMed PMC
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res.35, D61–D65 (2007). PubMed PMC
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res.50, D439–D444 (2021). PubMed PMC
Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature604, 310–315 (2022). PubMed PMC
Hornbeck, P. V. et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res.43, D512–D520 (2015). PubMed PMC
Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol.20, 223 (2019). PubMed PMC
Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc.8, 1551–1566 (2013). PubMed PMC
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers22, 2577–2637 (1983). PubMed
Dana, J. M. et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res.47, D482–D489 (2019). PubMed PMC
Armstrong, D. R. et al. PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res.48, D335–D343 (2020). PubMed PMC
Schrödinger, L. The PyMOL Molecular Graphics System, version 1.8 (2015).
Sancho, P. et al. Characterization of molecular mechanisms underlying the axonal Charcot–Marie–Tooth neuropathy caused by mutations. Hum. Mol. Genet28, 1629–1644 (2019). PubMed
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science381, eadg7492 (2023). PubMed
Ramos, E. M. et al. Characterizing genetic variants for clinical action. Am. J. Med. Genet. C Semin. Med. Genet.166, 93–104 (2014). PubMed PMC
Lau, T. K. & Leung, T. N. Genetic screening and diagnosis. Curr. Opin. Obstet. Gynecol.17, 163–169 (2005). PubMed
Stark, Z. & Scott, R. H. Genomic newborn screening for rare diseases. Nat. Rev. Genet.24, 755–766 (2023). PubMed
Hoffman-Andrews, L. The known unknown: the challenges of genetic variants of uncertain significance in clinical practice. J. Law Biosci.4, 648–657 (2017). PubMed PMC
Carter, T. C. & He, M. M. Challenges of identifying clinically actionable genetic variants for precision medicine. J. Healthc. Eng.10.1155/2016/3617572 (2016). PubMed PMC
Woodard, J., Iqbal, S. & Mashaghi, A. Circuit topology predicts pathogenicity of missense mutations. Proteins90, 1634–1644 (2022). PubMed PMC
Iqbal, S. et al. Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants. Proc. Natl Acad. Sci. USA117, 28201–28211 (2020). PubMed PMC
Iqbal, S. et al. MISCAST: MIssense variant to protein StruCture Analysis web SuiTe. Nucleic Acids Res.48, gkaa361 (2020). PubMed PMC
Costain, G. & Andrade, D. M. Third-generation computational approaches for genetic variant interpretation. Brain146, 411–412 (2023). PubMed
Watkins, X., Garcia, L. J., Pundir, S., Martin, M. J. & Consortium, U. ProtVista: visualization of protein sequence annotations. Bioinformatics33, 2040–2041 (2017). PubMed PMC
Bittrich, S. et al. RCSB Protein Data Bank: improved annotation, search and visualization of membrane protein structures archived in the PDB. Bioinformatics38, 1452–1454 (2022). PubMed PMC
Thormann, A. et al. Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP. Nat Commun.10.1038/s41467-019-10016-3 (2019). PubMed PMC
Bragin, E. et al. DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation. Nucleic Acids Res.42, D993–D1000 (2014). PubMed PMC
Stephenson, J. D., Laskowski, R. A., Nightingale, A., Hurles, M. E. & Thornton, J. VarMap: a web tool for mapping genomic coordinates to protein sequence and structure and retrieving protein structural annotations. Bioinformatics35, 4854–4856 (2019). PubMed PMC
Stephenson, J. D. et al. ProtVar: mapping and contextualizing human missense variation. Nucleic Acids Res.10.1093/nar/gkae413 (2024). PubMed PMC
Hicks, M., Bartha, I., di Iulio, J., Venter, J. C. & Telenti, A. Functional characterization of 3D protein structures informed by human genetic diversity. Proc. Natl Acad. Sci. USA116, 8960–8965 (2019). PubMed PMC
Iqbal, S. et al. Delineation of functionally essential protein regions for 242 neurodevelopmental genes. Brain146, 519–533 (2022). PubMed PMC
Meller, A. et al. Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nat. Commun.14, 1177 (2023). PubMed PMC
Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res.33, W382–W388 (2005). PubMed PMC
Tiberti, M. et al. MutateX: an automated pipeline for in silico saturation mutagenesis of protein structures and structural ensembles. Brief. Bioinform.23, bbac074 (2022). PubMed
Smedley, D. et al. BioMart—biological queries made easy. BMC Genomics10, 22 (2009). PubMed PMC
Segura, J., Rose, Y., Westbrook, J., Burley, S. K. & Duarte, J. M. RCSB Protein Data Bank 1D tools and services. Bioinformatics36, btaa1012 (2020). PubMed PMC
Sehnal, D. et al. Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Res.49, W431–W437 (2021). PubMed PMC
Madeira, F. et al. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res.50, W276–W279 (2022). PubMed PMC
Karolchik, D. et al. The UCSC Genome Browser Database. Nucleic Acids Res.31, 51–54 (2003). PubMed PMC
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res.40, D1100–D1107 (2012). PubMed PMC
Wishart, D. S. et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res.34, D668–D672 (2006). PubMed PMC
Weinreich, S. S., Mangon, R., Sikkens, J. J., Teeuw, M. E. E. & Cornel, M. C. Orphanet: a European database for rare diseases. Ned. Tijdschr. Geneeskd.152, 518–519 (2008). PubMed
Hamosh, A., Scott, A. F., Amberger, J., Valle, D. & McKusick, V. A. Online Mendelian Inheritance In Man (OMIM). Hum. Mutat.15, 57–61 (2000). PubMed
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 10.1186/s13059-016-0974-4 (2016). PubMed PMC