Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants
Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
33106425
PubMed Central
PMC7668189
DOI
10.1073/pnas.2002660117
PII: 2002660117
Knihovny.cz E-zdroje
- Klíčová slova
- 3D mutational hotspot, disease variation effect, machine learning, missense variant interpretation, protein structure and function,
- MeSH
- fosfohydroláza PTEN chemie genetika MeSH
- konformace proteinů MeSH
- lidé MeSH
- missense mutace genetika fyziologie MeSH
- molekulární modely MeSH
- protein BRCA1 chemie genetika MeSH
- proteiny chemie genetika fyziologie MeSH
- sekvence aminokyselin MeSH
- strojové učení MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- BRCA1 protein, human MeSH Prohlížeč
- fosfohydroláza PTEN MeSH
- protein BRCA1 MeSH
- proteiny MeSH
- PTEN protein, human MeSH Prohlížeč
Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.
Analytic and Translational Genetics Unit Massachusetts General Hospital Boston MA 02114
Center for the Development of Therapeutics Broad Institute of MIT and Harvard Cambridge MA 02142
Center for the Development of Therapeutics Broad Institute of MIT and Harvard Cambridge MA 02142;
Cologne Center for Genomics University of Cologne 50931 Cologne Germany
Department of Bio and Health Informatics Technical University of Denmark 2800 Kgs Lyngby Denmark
Department of Surgery Massachusetts General Hospital Boston MA 02114
Epilepsy Center Neurological Institute Cleveland Clinic Cleveland OH 44195
Genomic Medicine Institute Lerner Research Institute Cleveland Clinic Cleveland OH 44195
Institute for Molecular Medicine Finland University of Helsinki 00100 Helsinki Finland
Luxembourg Centre for Systems Biomedicine University of Luxembourg 4365 Esch sur Alzette Luxembourg
Program in Medical and Population Genetics Broad Institute of MIT and Harvard Cambridge MA 02142
Stanley Center for Psychiatric Research Broad Institute of MIT and Harvard Cambridge MA 02142
Stanley Center for Psychiatric Research Broad Institute of MIT and Harvard Cambridge MA 02142;
Zobrazit více v PubMed
Glusman G., Clinical applications of sequencing take center stage. Genome Biol. 14, 303 (2013). PubMed PMC
Dugger S. A., Platt A., Goldstein D. B., Drug development in the era of precision medicine. Nat. Rev. Drug Discov. 17, 183–196 (2018). PubMed PMC
Lek M., et al. , Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). PubMed PMC
McKusick V. A., Mendelian inheritance in man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007). PubMed PMC
Stenson P. D., et al. , The human gene mutation database: Building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014). PubMed PMC
Landrum M. J., et al. , ClinVar: Improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018). PubMed PMC
Karczewski K. J., et al. , The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). PubMed PMC
Berman H. M., Bourne P. E., Westbrook J., Zardecki C., “The protein data bank” in Protein Structure, Chasman D., Ed. (CRC, 2003), pp. 394–410.
Adzhubei I. A., et al. , A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010). PubMed PMC
Kircher M., et al. , A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014). PubMed PMC
Ng P. C., Henikoff S., SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003). PubMed PMC
Baugh E. H., et al. , Robust classification of protein variation using structural modelling and large-scale data integration. Nucleic Acids Res. 44, 2501–2513 (2016). PubMed PMC
Sundaram L., et al. , Predicting the clinical impact of human mutation with deep neural networks. Nat. Genet. 50, 1161–1170 (2018). PubMed PMC
Pejaver V., Mooney S. D., Radivojac P., Missense variant pathogenicity predictors generalize well across a range of function-specific prediction challenges. Hum. Mutat. 38, 1092–1108 (2017). PubMed PMC
David A., Sternberg M. J., The contribution of missense mutations in core and rim residues of protein–protein interfaces to human disease. J. Mol. Biol. 427, 2886–2898 (2015). PubMed PMC
Nishi H., Nakata J., Kinoshita K., Distribution of single-nucleotide variants on protein–protein interaction sites and its relationship with minor allele frequency. Protein Sci. 25, 316–321 (2016). PubMed PMC
Sahni N., et al. , Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015). PubMed PMC
Stefl S., Nishi H., Petukh M., Panchenko A. R., Alexov E., Molecular mechanisms of disease-causing missense mutations. J. Mol. Biol. 425, 3919–3936 (2013). PubMed PMC
Petukh M., Kucukkal T. G., Alexov E., On human disease-causing amino acid variants: Statistical study of sequence and structural patterns. Hum. Mutat. 36, 524–534 (2015). PubMed PMC
Kucukkal T. G., Petukh M., Li L., Alexov E., Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. Curr. Opin. Struct. Biol. 32, 18–24 (2015). PubMed PMC
Gao M., Zhou H., Skolnick J., Insights into disease-associated mutations in the human proteome through protein structural analysis. Structure 23, 1362–1369 (2015). PubMed PMC
Araya C. L., et al. , Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nat. Genet. 48, 117–125 (2016). PubMed PMC
Kamburov A., et al. , Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl. Acad. Sci. U.S.A 112, E5486–E5495 (2015). PubMed PMC
Sivley R. M., Dou X., Meiler J., Bush W. S., Capra J. A., Comprehensive analysis of constraint on the spatial distribution of missense variants in human protein structures. Am. J. Hum. Genet. 102, 415–426 (2018). PubMed PMC
Meyer M. J., et al. , Mutation3D: Cancer gene prediction through atomic clustering of coding variants in the structural proteome. Hum. Mutat. 37, 447–456 (2016). PubMed PMC
Tokheim C., et al. , Exome-scale discovery of hotspot mutation regions in human cancer using 3D protein structure. Cancer Res. 76, 3719–3731 (2016). PubMed PMC
Ittisoponpisan S., et al. , Can predicted protein 3D structures provide reliable insights into whether missense variants are disease associated? J. Mol. Biol. 431, 2197–2212 (2019). PubMed PMC
Yates C. M., Filippis I., Kelley L. A., Sternberg M. J., Suspect: Enhanced prediction of single amino acid variant (SAV) phenotype using network features. J. Mol. Biol. 426, 2692–2701 (2014). PubMed PMC
Laskowski R. A., Stephenson J. D., Sillitoe I., Orengo C. A., Thornton J. M., VarSite: Disease variants and protein structure. Protein Sci. 29, 111–119 (2020). PubMed PMC
Richards S., et al. , Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015). PubMed PMC
Fersht A., Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding (Macmillan, 1999).
Worth C. L., Gong S., Blundell T. L., Structural and functional constraints in the evolution of protein families. Nat. Rev. Mol. Cell Biol. 10, 709–720 (2009). PubMed
Williams S. G., Lovell S. C., The effect of sequence evolution on protein structural divergence. Mol. Biol. Evol. 26, 1055–1065 (2009). PubMed
Sanders S. J., et al. , Progress in understanding and treating SCN2A-mediated disorders. Trends Neurosci. 41, 442–456 (2018). PubMed PMC
Spillane J., Kullmann D., Hanna M., Genetic neurological channelopathies: Molecular genetics and clinical phenotypes. J. Neurol. Neurosurg. Psychiatry 87, 37–48 (2016). PubMed PMC
Heyne H. O., et al. , Predicting functional effects of missense variants in voltage-gated sodium and calcium channels. Sci. Transl. Med. 12, eaay6848 (2020). PubMed
Smith I. N., Thacker S., Jaini R., Eng C., Dynamics and structural stability effects of germline PTEN mutations associated with cancer versus autism phenotypes. J. Biomol. Struct. Dyn. 37, 1766–1782 (2019). PubMed PMC
Olson H. E., et al. , Cyclin-dependent kinase-like 5 (CDKL5) deficiency disorder: Clinical review. Pediatr. Neurol. 97, 18–25 (2019). PubMed PMC
Velankar S., et al. , SIFTS: Structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 41, D483–D489 (2012). PubMed PMC
Kabsch W., Sander C., Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolym. Orig. Res. on Biomol. 22, 2577–2637 (1983). PubMed
Laskowski R. A., Jabłońska J., Pravda L., Vařeková R. S., Thornton J. M., PDBsum: Structural summaries of PDB entries. Protein Sci. 27, 129–134 (2018). PubMed PMC
Hornbeck P. V., et al. , Phosphositeplus, 2014: Mutations, PTMs and recalibrations. Nucleic Acids Res. 43, D512–D520 (2015). PubMed PMC
Apweiler R., et al. , Uniprot: The universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004). PubMed PMC
Mi H., et al. , PANTHER version 11: Expanded annotation data from gene ontology and reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 45, D183–D189 (2017). PubMed PMC
Bürkle A., “Posttranslational modification ” in Encyclopedia of Genetics, Brenner S., Miller J. H., Eds. (Academic, New York, 2001), p. 1533.
Dougherty D. A., Cation- PubMed
Friedberg I., Margalit H., Persistently conserved positions in structurally similar, sequence dissimilar proteins: Roles in preserving protein fold and function. Protein Sci. 11, 350–360 (2002). PubMed PMC
Rentzsch P., Witten D., Cooper G. M., Shendure J., Kircher M., CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019). PubMed PMC
Liaw A., Wiener M., Classification and regression by randomforest. R News 2, 18–22 (2002).
Findlay G. M., et al. , Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018). PubMed PMC
Mighell T. L., Evans-Dutson S., O’Roak B. J., A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotype relationships. Am. J. Hum. Genet. 102, 943–955 (2018). PubMed PMC
Iqbal S., et al. , MISCAST: MIssense variant to protein StruCture Analysis web SuiTe, Nucleic Acids Res. 48, W132–W139 (2020). PubMed PMC
Li J., et al. , Performance evaluation of pathogenicity-computation methods for missense variants. Nucleic Acids Res. 46, 7793–7804 (2018). PubMed PMC
Thusberg J., Olatubosun A., Vihinen M., Performance of mutation pathogenicity prediction methods on missense variants. Hum. Mutat. 32, 358–368 (2011). PubMed
Starita L. M., et al. , Variant interpretation: Functional assays to the rescue. Am. J. Hum. Genet. 101, 315–325 (2017). PubMed PMC
Raraigh K. S., et al. , Functional assays are essential for interpretation of missense variants associated with variable expressivity. Am. J. Hum. Genet. 102, 1062–1077 (2018). PubMed PMC
Li Q., Wang K., Intervar: Clinical interpretation of genetic variants by the 2015 ACMG-AMP guidelines. Am. J. Hum. Genet. 100, 267–280 (2017). PubMed PMC
Mitchell A. L., et al. , InterPro in 2019: Improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 47, D351–D360 (2019). PubMed PMC
Reimand J., Wagih O., Bader G. D., Evolutionary constraint and disease associations of post-translational modification sites in human genomes. PLoS Genet. 11, e1004919 (2015). PubMed PMC
Del-Toro N., et al. , Capturing variation impact on molecular interactions in the IMEx Consortium mutations data set. Nat. Commun. 10, 1–14 (2019). PubMed PMC
Abrusán G., Marsh J. A., Alpha helices are more robust to mutations than beta strands. PLoS Comput. Biol. 12, e1005242 (2016). PubMed PMC
Fodje M., Al-Karadaghi S., Occurrence, conformational features and amino acid propensities for the PubMed
Hicks M., Bartha I., di Iulio J., Venter J. C., Telenti A., Functional characterization of 3D protein structures informed by human genetic diversity. Proc. Natl. Acad. Sci. U.S.A. 116, 8960–8965 (2019). PubMed PMC
Aukrust I., et al. , SUMOylation of pancreatic glucokinase regulates its cellular stability and activity. J. Biol. Chem. 288, 5951–5962 (2013). PubMed PMC
Krassowski M., et al. , ActiveDriverDB: Human disease mutations and genome variation in post-translational modification sites of proteins. Nucleic Acids Res. 46, D901–D910 (2018). PubMed PMC
Sitbon E., Pietrokovski S., Occurrence of protein structure elements in conserved sequence regions. BMC Struct. Biol. 7, 3 (2007). PubMed PMC
Beaglehole R., et al. , Basic Epidemiology (World Health Organization, Geneva, Switzerland, 1993).
Yehia L., Keel E., Eng C., The clinical spectrum of PTEN mutations. Annu. Rev. Med. 71, 103–116 (2019). PubMed
Yates C. M., Sternberg M. J., Proteins and domains vary in their tolerance of non-synonymous single nucleotide polymorphisms (nsSNPs). J. Mol. Biol. 425, 1274–1286 (2013). PubMed
Knudsen M., Wiuf C., The CATH database. Hum. Genomics 4, 207–212 (2010). PubMed PMC
Delineation of functionally essential protein regions for 242 neurodevelopmental genes