Genomics 2 Proteins portal: a resource and discovery tool for linking genetic screening outputs to protein sequences and structures

. 2024 Oct ; 21 (10) : 1947-1957. [epub] 20240918

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid39294369

Grantová podpora
RM1 HG010461 NHGRI NIH HHS - United States
UM1 HG011969 NHGRI NIH HHS - United States
UM1HG011969 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
RM1HG010461 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)

Odkazy

PubMed 39294369
PubMed Central PMC11466821
DOI 10.1038/s41592-024-02409-0
PII: 10.1038/s41592-024-02409-0
Knihovny.cz E-zdroje

Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics have generated genetic variants at an unprecedented scale. However, efficient tools and resources are needed to link disparate data types-to 'map' variants onto protein structures, to better understand how the variation causes disease, and thereby design therapeutics. Here we present the Genomics 2 Proteins portal ( https://g2p.broadinstitute.org/ ): a human proteome-wide resource that maps 20,076,998 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the Genomics 2 Proteins portal allows users to interactively upload protein residue-wise annotations (for example, variants and scores) as well as the protein structure beyond databases to establish the connection between genomics to proteins. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotypes.

Před aktualizací

PubMed

Zobrazit více v PubMed

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature596, 583–589 (2021). PubMed PMC

Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science373, 871–876 (2021). PubMed PMC

Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science384, eadl2528 (2024). PubMed

Lin, Z. M. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379, 1123–1130 (2023). PubMed

Hekkelman, M. L., Vries, I. D., Joosten, R. P. & Perrakis, A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat. Methods20, 205–213 (2023). PubMed PMC

Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res.28, 235–242 (2000). PubMed PMC

Burley, S. K. et al. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res.47, gky949 (2018). PubMed PMC

Patwardhan, A. et al. Data management challenges in three-dimensional EM. Nat. Struct. Mol. Biol.19, 1203–1207 (2012). PubMed PMC

Gudmundsson, S. et al. Variant interpretation using population databases: lessons from gnomAD. Hum. Mutat.43, 1012–1030 (2022). PubMed PMC

Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res.46, gkx1153 (2017). PubMed PMC

Stenson, P. D. et al. The Human Gene Mutation Database (HGMD): optimizing its use in a clinical diagnostic or research setting. Hum. Genet.139, 1197–1207 (2020). PubMed PMC

Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature581, 434–443 (2020). PubMed PMC

Turner, T. N. et al. denovo-db: a compendium of human de novo variants. Nucleic Acids Res.45, D804–D811 (2017). PubMed PMC

Porto, E. M., Komor, A. C., Slaymaker, I. M. & Yeo, G. W. Base editing: advances and therapeutic opportunities. Nat. Rev. Drug Discov.19, 839–859 (2020). PubMed PMC

Lue, N. Z. et al. Base editor scanning charts the DNMT3A activity landscape. Nat. Chem. Biol.19, 176–186 (2023). PubMed PMC

Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature576, 149–157 (2019). PubMed PMC

Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell167, 1853–1866 (2016). PubMed PMC

Andreadis, A., Gallego, M. E. & Nadal-Ginard, B. Generation of protein isoform diversity by alternative splicing: mechanistic and biological implications. Annu. Rev. Cell Biol.3, 207–242 (1987). PubMed

Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res.29, 308–311 (2001). PubMed PMC

den Dunnen, J. T. Describing sequence variants using HGVS nomenclature. in Genotyping: Methods and Protocols (eds White S. J. & Cantsilieris S.) 243–251 (Springer New York, 2017). PubMed

Apweiler, R. et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res.32, D115–D119 (2004). PubMed PMC

Seal, R. L. et al. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res.51, D1003–D1009 (2022). PubMed PMC

Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res.30, 38–41 (2002). PubMed PMC

Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res.35, D61–D65 (2007). PubMed PMC

Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res.50, D439–D444 (2021). PubMed PMC

Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature604, 310–315 (2022). PubMed PMC

Hornbeck, P. V. et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res.43, D512–D520 (2015). PubMed PMC

Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol.20, 223 (2019). PubMed PMC

Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc.8, 1551–1566 (2013). PubMed PMC

Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers22, 2577–2637 (1983). PubMed

Dana, J. M. et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res.47, D482–D489 (2019). PubMed PMC

Armstrong, D. R. et al. PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res.48, D335–D343 (2020). PubMed PMC

Schrödinger, L. The PyMOL Molecular Graphics System, version 1.8 (2015).

Sancho, P. et al. Characterization of molecular mechanisms underlying the axonal Charcot–Marie–Tooth neuropathy caused by mutations. Hum. Mol. Genet28, 1629–1644 (2019). PubMed

Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science381, eadg7492 (2023). PubMed

Ramos, E. M. et al. Characterizing genetic variants for clinical action. Am. J. Med. Genet. C Semin. Med. Genet.166, 93–104 (2014). PubMed PMC

Lau, T. K. & Leung, T. N. Genetic screening and diagnosis. Curr. Opin. Obstet. Gynecol.17, 163–169 (2005). PubMed

Stark, Z. & Scott, R. H. Genomic newborn screening for rare diseases. Nat. Rev. Genet.24, 755–766 (2023). PubMed

Hoffman-Andrews, L. The known unknown: the challenges of genetic variants of uncertain significance in clinical practice. J. Law Biosci.4, 648–657 (2017). PubMed PMC

Carter, T. C. & He, M. M. Challenges of identifying clinically actionable genetic variants for precision medicine. J. Healthc. Eng.10.1155/2016/3617572 (2016). PubMed PMC

Woodard, J., Iqbal, S. & Mashaghi, A. Circuit topology predicts pathogenicity of missense mutations. Proteins90, 1634–1644 (2022). PubMed PMC

Iqbal, S. et al. Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants. Proc. Natl Acad. Sci. USA117, 28201–28211 (2020). PubMed PMC

Iqbal, S. et al. MISCAST: MIssense variant to protein StruCture Analysis web SuiTe. Nucleic Acids Res.48, gkaa361 (2020). PubMed PMC

Costain, G. & Andrade, D. M. Third-generation computational approaches for genetic variant interpretation. Brain146, 411–412 (2023). PubMed

Watkins, X., Garcia, L. J., Pundir, S., Martin, M. J. & Consortium, U. ProtVista: visualization of protein sequence annotations. Bioinformatics33, 2040–2041 (2017). PubMed PMC

Bittrich, S. et al. RCSB Protein Data Bank: improved annotation, search and visualization of membrane protein structures archived in the PDB. Bioinformatics38, 1452–1454 (2022). PubMed PMC

Thormann, A. et al. Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP. Nat Commun.10.1038/s41467-019-10016-3 (2019). PubMed PMC

Bragin, E. et al. DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation. Nucleic Acids Res.42, D993–D1000 (2014). PubMed PMC

Stephenson, J. D., Laskowski, R. A., Nightingale, A., Hurles, M. E. & Thornton, J. VarMap: a web tool for mapping genomic coordinates to protein sequence and structure and retrieving protein structural annotations. Bioinformatics35, 4854–4856 (2019). PubMed PMC

Stephenson, J. D. et al. ProtVar: mapping and contextualizing human missense variation. Nucleic Acids Res.10.1093/nar/gkae413 (2024). PubMed PMC

Hicks, M., Bartha, I., di Iulio, J., Venter, J. C. & Telenti, A. Functional characterization of 3D protein structures informed by human genetic diversity. Proc. Natl Acad. Sci. USA116, 8960–8965 (2019). PubMed PMC

Iqbal, S. et al. Delineation of functionally essential protein regions for 242 neurodevelopmental genes. Brain146, 519–533 (2022). PubMed PMC

Meller, A. et al. Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nat. Commun.14, 1177 (2023). PubMed PMC

Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res.33, W382–W388 (2005). PubMed PMC

Tiberti, M. et al. MutateX: an automated pipeline for in silico saturation mutagenesis of protein structures and structural ensembles. Brief. Bioinform.23, bbac074 (2022). PubMed

Smedley, D. et al. BioMart—biological queries made easy. BMC Genomics10, 22 (2009). PubMed PMC

Segura, J., Rose, Y., Westbrook, J., Burley, S. K. & Duarte, J. M. RCSB Protein Data Bank 1D tools and services. Bioinformatics36, btaa1012 (2020). PubMed PMC

Sehnal, D. et al. Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Res.49, W431–W437 (2021). PubMed PMC

Madeira, F. et al. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res.50, W276–W279 (2022). PubMed PMC

Karolchik, D. et al. The UCSC Genome Browser Database. Nucleic Acids Res.31, 51–54 (2003). PubMed PMC

Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res.40, D1100–D1107 (2012). PubMed PMC

Wishart, D. S. et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res.34, D668–D672 (2006). PubMed PMC

Weinreich, S. S., Mangon, R., Sikkens, J. J., Teeuw, M. E. E. & Cornel, M. C. Orphanet: a European database for rare diseases. Ned. Tijdschr. Geneeskd.152, 518–519 (2008). PubMed

Hamosh, A., Scott, A. F., Amberger, J., Valle, D. & McKusick, V. A. Online Mendelian Inheritance In Man (OMIM). Hum. Mutat.15, 57–61 (2000). PubMed

McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 10.1186/s13059-016-0974-4 (2016). PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...