PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations

. 2014 Jan ; 10 (1) : e1003440. [epub] 20140116

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid24453961
Odkazy

PubMed 24453961
PubMed Central PMC3894168
DOI 10.1371/journal.pcbi.1003440
PII: PCOMPBIOL-D-13-01477
Knihovny.cz E-zdroje

Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.

Zobrazit více v PubMed

Collins FS, Brooks LD, Chakravarti A (1998) A DNA polymorphism discovery resource for research on human genetic variation. Genome Res 8: 1229–1231 PubMed

Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073doi:10.1038/nature09534 PubMed DOI PMC

Collins FS, Guyer MS, Charkravarti A (1997) Variations on a theme: cataloging human DNA sequence variation. Science 278: 1580–1581 PubMed

Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273: 1516–1517 PubMed

Studer RA, Dessailly BH, Orengo CA (2013) Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 449: 581–594doi:10.1042/BJ20121221 PubMed DOI

Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, et al. (1999) Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet 22: 231–238doi:10.1038/10290 PubMed DOI

Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, et al. (1999) Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet 22: 239–247doi:10.1038/10297 PubMed DOI

Tranchevent L-C, Capdevila FB, Nitsch D, De Moor B, De Causmaecker P, et al. (2011) A guide to web tools to prioritize candidate genes. Brief Bioinform 12: 22–32doi:10.1093/bib/bbq007 PubMed DOI

Capriotti E, Nehrt NL, Kann MG, Bromberg Y (2012) Bioinformatics for personal genome interpretation. Brief Bioinform 13: 495–512doi:10.1093/bib/bbr070 PubMed DOI PMC

Li B, Krishnan VG, Mort ME, Xin F, Kamati KK, et al. (2009) Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinforma Oxf Engl 25: 2744–2750doi:10.1093/bioinformatics/btp528 PubMed DOI PMC

Bao L, Zhou M, Cui Y (2005) nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Res 33: W480–W482doi:10.1093/nar/gki372 PubMed DOI PMC

Ramensky V, Bork P, Sunyaev S (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res 30: 3894–3900doi:10.1093/nar/gkf493 PubMed DOI PMC

Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249doi:10.1038/nmeth0410-248 PubMed DOI PMC

Bromberg Y, Rost B (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 35: 3823–3835doi:10.1093/nar/gkm238 PubMed DOI PMC

Stone EA, Sidow A (2005) Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res 15: 978–986doi:10.1101/gr.3804205 PubMed DOI PMC

Thomas PD, Kejariwal A (2004) Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: Evolutionary evidence for differences in molecular effects. Proc Natl Acad Sci U S A 101: 15398–15403doi:10.1073/pnas.0404380101 PubMed DOI PMC

Capriotti E, Calabrese R, Casadio R (2006) Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22: 2729–2734doi:10.1093/bioinformatics/btl423 PubMed DOI

Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31: 3812–3814 PubMed PMC

Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R (2009) Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 30: 1237–1244doi:10.1002/humu.21047 PubMed DOI

Karchin R (2009) Next generation tools for the annotation of human SNPs. Brief Bioinform 10: 35–52doi:10.1093/bib/bbn047 PubMed DOI PMC

Ng PC, Henikoff S (2006) Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 7: 61–80doi:10.1146/annurev.genom.7.080505.115630 PubMed DOI

Castaldi PJ, Dahabreh IJ, Ioannidis JPA (2011) An empirical assessment of validation practices for molecular classifiers. Brief Bioinform 12: 189–202doi:10.1093/bib/bbq073 PubMed DOI PMC

Baldi P, Brunak S (2001) Bioinformatics: The machine learning approach. CambridgeMA: MIT Press. 492 p.

Simon R (2005) Roadmap for developing and validating therapeutically relevant genomic classifiers. J Clin Oncol Off J Am Soc Clin Oncol 23: 7332–7341doi:10.1200/JCO.2005.02.8712 PubMed DOI

Thusberg J, Olatubosun A, Vihinen M (2011) Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 32: 358–368doi:10.1002/humu.21445 PubMed DOI

Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6: 21–45doi:10.1109/MCAS.2006.1688199 DOI

González-Pérez A, López-Bigas N (2011) Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet 88: 440–449doi:10.1016/j.ajhg.2011.03.004 PubMed DOI PMC

Olatubosun A, Väliaho J, Härkönen J, Thusberg J, Vihinen M (2012) PON-P: Integrated predictor for pathogenicity of missense variants. Hum Mutat 33: 1166–1174doi:10.1002/humu.22102 PubMed DOI

Capriotti E, Altman RB, Bromberg Y (2013) Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 14: S2.doi:10.1186/1471-2164-14-S3-S2 PubMed DOI PMC

Kawabata T, Ota M, Nishikawa K (1999) The Protein Mutant Database. Nucleic Acids Res 27: 355–357 PubMed PMC

The UniProt Consortium (2011) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40: D71–D75doi:10.1093/nar/gkr981 PubMed DOI PMC

Sunyaev S, Ramensky V, Koch I, Lathe W 3rd, Kondrashov AS, et al. (2001) Prediction of deleterious human alleles. Hum Mol Genet 10: 591–597 PubMed

Pavelka A, Chovancova E, Damborsky J (2009) HotSpot Wizard: a web server for identification of hot spots in protein engineering. Nucleic Acids Res 37: W376–W383doi:10.1093/nar/gkp410 PubMed DOI PMC

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 PubMed PMC

Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, et al. (2010) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 38: D5–D16doi:10.1093/nar/gkp967 PubMed DOI PMC

Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26: 680–682doi:10.1093/bioinformatics/btq003 PubMed DOI PMC

Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5: 113.doi:10.1186/1471-2105-5-113 PubMed DOI PMC

Friedman N, Ninio M, Pe'er I, Pupko T (2002) A structural EM algorithm for phylogenetic inference. J Comput Biol J Comput Mol Cell Biol 9: 331–353doi:10.1089/10665270252935494 PubMed DOI

Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, et al. (2012) The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr Protoc Bioinforma Chapter 1: Unit1.13.doi:10.1002/0471250953.bi0113s39 PubMed DOI

Giardine B, Riemer C, Hefferon T, Thomas D, Hsu F, et al. (2007) PhenCode: connecting ENCODE data with mutations and phenotype. Hum Mutat 28: 554–562doi:10.1002/humu.20484 PubMed DOI

Piirilä H, Väliaho J, Vihinen M (2006) Immunodeficiency mutation databases (IDbases). Hum Mutat 27: 1200–1208doi:10.1002/humu.20405 PubMed DOI

Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, et al. (2006) The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 34: D187–D191doi:10.1093/nar/gkj161 PubMed DOI PMC

Yampolsky LY, Stoltzfus A (2005) The exchangeability of amino acids in proteins. Genetics 170: 1459–1472doi:10.1534/genetics.104.039107 PubMed DOI PMC

Aehle W, Cascao-Pereira LG, Estell DA, Goedegebuur F, Kellis JJT, et al.. (2010) Compositions and methods comprising serine protease variants.

Cuevas WA, Estell DE, Hadi SH, Lee S-K, Ramer SW, et al.. (2009) Geobacillus Stearothermophilus Alpha-Amylase (AmyS) Variants with Improved Properties.

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, et al. (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11: 10–18doi:10.1145/1656274.1656278 DOI

John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. UAI'95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. pp. 338–345. Available: http://dl.acm.org/citation.cfm?id=2074158.2074196 Accessed 25 June 2013.

Cessie L, Houwelingen V (1992) Ridge estimators in logistic regression. Appl Stat 41: 191–201doi:10.2307/2347628 DOI

Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37: 277–296doi:10.1023/A:1007662407062 DOI

Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2: 27:1–27:27doi:10.1145/1961189.1961199 DOI

Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6: 37–66doi:10.1023/A:1022689900470 DOI

Breiman L (2001) Random forests. Mach Learn 45: 5–32doi:10.1023/A:1010933404324 DOI

Chandonia J-M, Hon G, Walker NS, Lo Conte L, Koehl P, et al. (2004) The ASTRAL Compendium in 2004. Nucleic Acids Res 32: D189–192doi:10.1093/nar/gkh034 PubMed DOI PMC

Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, et al. (2003) PANTHER: A Library of protein families and subfamilies indexed by function. Genome Res 13: 2129–2141doi:10.1101/gr.772403 PubMed DOI PMC

Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, et al. (2006) Machine learning in bioinformatics. Brief Bioinform 7: 86–112doi:10.1093/bib/bbk007 PubMed DOI

Baldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 16: 412–424doi:10.1093/bioinformatics/16.5.412 PubMed DOI

Cooper GM, Shendure J (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12: 628–640doi:10.1038/nrg3046 PubMed DOI

Bleasby AJ, Akrigg D, Attwood TK (1994) OWL–a non-redundant composite protein sequence database. Nucleic Acids Res 22: 3574–3577 PubMed PMC

Sim N-L, Kumar P, Hu J, Henikoff S, Schneider G, et al. (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40: W452–W457doi:10.1093/nar/gks539 PubMed DOI PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Analysis of mutations in precision oncology using the automated, accurate, and user-friendly web tool PredictONCO

. 2024 Dec ; 24 () : 734-738. [epub] 20241114

A computational workflow for analysis of missense mutations in precision oncology

. 2024 Jul 29 ; 16 (1) : 86. [epub] 20240729

PredictONCO: a web tool supporting decision-making in precision oncology by extending the bioinformatics predictions with advanced computing and machine learning

. 2023 Nov 22 ; 25 (1) : .

Refinement of evolutionary medicine predictions based on clinical evidence for the manifestations of Mendelian diseases

. 2019 Dec 09 ; 9 (1) : 18577. [epub] 20191209

Structural and Functional Impact of Seven Missense Variants of Phenylalanine Hydroxylase

. 2019 Jun 15 ; 10 (6) : . [epub] 20190615

ATM mutations in major stereotyped subsets of chronic lymphocytic leukemia: enrichment in subset #2 is associated with markedly short telomeres

. 2016 Sep ; 101 (9) : e369-73. [epub] 20160616

PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

. 2016 May ; 12 (5) : e1004962. [epub] 20160525

Alagille Syndrome Mimicking Biliary Atresia in Early Infancy

. 2015 ; 10 (11) : e0143939. [epub] 20151130

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace