Nejvíce citovaný článek - PubMed ID 24453961
PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations
Next-generation sequencing technology has created many new opportunities for clinical diagnostics, but it faces the challenge of functional annotation of identified mutations. Various algorithms have been developed to predict the impact of missense variants that influence oncogenic drivers. However, computational pipelines that handle biological data must integrate multiple software tools, which can add complexity and hinder non-specialist users from accessing the pipeline. Here, we have developed an online user-friendly web server tool PredictONCO that is fully automated and has a low barrier to access. The tool models the structure of the mutant protein in the first step. Next, it calculates the protein stability change, pocket level information, evolutionary conservation, and changes in ionisation of catalytic amino acid residues, and uses them as the features in the machine-learning predictor. The XGBoost-based predictor was validated on an independent subset of held-out data, demonstrating areas under the receiver operating characteristic curve (ROC) of 0.97 and 0.94, and the average precision from the precision-recall curve of 0.99 and 0.94 for structure-based and sequence-based predictions, respectively. Finally, PredictONCO calculates the docking results of small molecules approved by regulatory authorities. We demonstrate the applicability of the tool by presenting its usage for variants in two cancer-associated proteins, cellular tumour antigen p53 and fibroblast growth factor receptor FGFR1. Our free web tool will assist with the interpretation of data from next-generation sequencing and navigate treatment strategies in clinical oncology: https://loschmidt.chemi.muni.cz/predictonco/.
- Klíčová slova
- Automation, Machine learning, Mutation, Next-generation sequencing, Oncogenicity, Precision oncology, Prediction, Treatment, Virtual screening, Webserver,
- Publikační typ
- časopisecké články MeSH
Every year, more than 19 million cancer cases are diagnosed, and this number continues to increase annually. Since standard treatment options have varying success rates for different types of cancer, understanding the biology of an individual's tumour becomes crucial, especially for cases that are difficult to treat. Personalised high-throughput profiling, using next-generation sequencing, allows for a comprehensive examination of biopsy specimens. Furthermore, the widespread use of this technology has generated a wealth of information on cancer-specific gene alterations. However, there exists a significant gap between identified alterations and their proven impact on protein function. Here, we present a bioinformatics pipeline that enables fast analysis of a missense mutation's effect on stability and function in known oncogenic proteins. This pipeline is coupled with a predictor that summarises the outputs of different tools used throughout the pipeline, providing a single probability score, achieving a balanced accuracy above 86%. The pipeline incorporates a virtual screening method to suggest potential FDA/EMA-approved drugs to be considered for treatment. We showcase three case studies to demonstrate the timely utility of this pipeline. To facilitate access and analysis of cancer-related mutations, we have packaged the pipeline as a web server, which is freely available at https://loschmidt.chemi.muni.cz/predictonco/ .Scientific contributionThis work presents a novel bioinformatics pipeline that integrates multiple computational tools to predict the effects of missense mutations on proteins of oncological interest. The pipeline uniquely combines fast protein modelling, stability prediction, and evolutionary analysis with virtual drug screening, while offering actionable insights for precision oncology. This comprehensive approach surpasses existing tools by automating the interpretation of mutations and suggesting potential treatments, thereby striving to bridge the gap between sequencing data and clinical application.
- Klíčová slova
- Bioinformatics, Cancer, Function, High-performance computing, Machine learning, Molecular modelling, Oncology, Personalised medicine, Single nucleotide polymorphism, Stability, Treatment,
- Publikační typ
- časopisecké články MeSH
PredictONCO 1.0 is a unique web server that analyzes effects of mutations on proteins frequently altered in various cancer types. The server can assess the impact of mutations on the protein sequential and structural properties and apply a virtual screening to identify potential inhibitors that could be used as a highly individualized therapeutic approach, possibly based on the drug repurposing. PredictONCO integrates predictive algorithms and state-of-the-art computational tools combined with information from established databases. The user interface was carefully designed for the target specialists in precision oncology, molecular pathology, clinical genetics and clinical sciences. The tool summarizes the effect of the mutation on protein stability and function and currently covers 44 common oncological targets. The binding affinities of Food and Drug Administration/ European Medicines Agency -approved drugs with the wild-type and mutant proteins are calculated to facilitate treatment decisions. The reliability of predictions was confirmed against 108 clinically validated mutations. The server provides a fast and compact output, ideal for the often time-sensitive decision-making process in oncology. Three use cases of missense mutations, (i) K22A in cyclin-dependent kinase 4 identified in melanoma, (ii) E1197K mutation in anaplastic lymphoma kinase 4 identified in lung carcinoma and (iii) V765A mutation in epidermal growth factor receptor in a patient with congenital mismatch repair deficiency highlight how the tool can increase levels of confidence regarding the pathogenicity of the variants and identify the most effective inhibitors. The server is available at https://loschmidt.chemi.muni.cz/predictonco.
- Klíčová slova
- cancer, oncology, personalized medicine, single-nucleotide polymorphism, targeted therapy,
- MeSH
- individualizovaná medicína * MeSH
- lidé MeSH
- melanom * MeSH
- mutace MeSH
- proteiny MeSH
- reprodukovatelnost výsledků MeSH
- strojové učení MeSH
- výpočetní biologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny MeSH
Prediction methods have become an integral part of biomedical and biotechnological research. However, their clinical interpretations are largely based on biochemical or molecular data, but not clinical data. Here, we focus on improving the reliability and clinical applicability of prediction algorithms. We assembled and curated two large non-overlapping large databases of clinical phenotypes. These phenotypes were caused by missense variations in 44 and 63 genes associated with Mendelian diseases. We used these databases to establish and validate the model, allowing us to improve the predictions obtained from EVmutation, SNAP2 and PoPMuSiC 2.1. The predictions of clinical effects suffered from a lack of specificity, which appears to be the common constraint of all recently used prediction methods, although predictions mediated by these methods are associated with nearly absolute sensitivity. We introduced evidence-based tailoring of the default settings of the prediction methods; this tailoring substantially improved the prediction outcomes. Additionally, the comparisons of the clinically observed and theoretical variations led to the identification of large previously unreported pools of variations that were under negative selection during molecular evolution. The evolutionary variation analysis approach described here is the first to enable the highly specific identification of likely disease-causing missense variations that have not yet been associated with any clinical phenotype.
- MeSH
- algoritmy MeSH
- ektodysplasiny genetika MeSH
- fenotyp MeSH
- genetická variace MeSH
- genetické nemoci vrozené genetika MeSH
- genomika MeSH
- glukosa-6-fosfátdehydrogenasa genetika MeSH
- hemoglobiny genetika MeSH
- hepatocytární jaderný faktor 4 genetika MeSH
- lidé MeSH
- missense mutace MeSH
- modely genetické * MeSH
- molekulární evoluce MeSH
- mutace * MeSH
- pravděpodobnostní funkce MeSH
- proteomika MeSH
- tyrosinfosfatasa nereceptorového typu 11 genetika MeSH
- výpočetní biologie metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- EDA protein, human MeSH Prohlížeč
- ektodysplasiny MeSH
- G6PD protein, human MeSH Prohlížeč
- glukosa-6-fosfátdehydrogenasa MeSH
- hemoglobin B MeSH Prohlížeč
- hemoglobiny MeSH
- hepatocytární jaderný faktor 4 MeSH
- HNF4A protein, human MeSH Prohlížeč
- PTPN11 protein, human MeSH Prohlížeč
- tyrosinfosfatasa nereceptorového typu 11 MeSH
The molecular genetics of well-characterized inherited diseases, such as phenylketonuria (PKU) and hyperphenylalaninemia (HPA) predominantly caused by mutations in the phenylalanine hydroxylase (PAH) gene, is often complicated by the identification of many novel variants, often with no obvious impact on the associated disorder. To date, more than 1100 PAH variants have been identified of which a substantial portion have unknown clinical significance. In this work, we study the functionality of seven yet uncharacterized PAH missense variants p.Asn167Tyr, p.Thr200Asn, p.Asp229Gly, p.Gly239Ala, p.Phe263Ser, p.Ala342Pro, and p.Ile406Met first identified in the Czech PKU/HPA patients. From all tested variants, three of them, namely p.Asn167Tyr, p.Thr200Asn, and p.Ile406Met, exerted residual enzymatic activity in vitro similar to wild type (WT) PAH, however, when expressed in HepG2 cells, their protein level reached a maximum of 72.1% ± 4.9%, 11.2% ± 4.2%, and 36.6% ± 7.3% compared to WT PAH, respectively. Remaining variants were null with no enzyme activity and decreased protein levels in HepG2 cells. The chaperone-like effect of applied BH4 precursor increased protein level significantly for p.Asn167Tyr, p.Asp229Gly, p.Ala342Pro, and p.Ile406Met. Taken together, our results of functional characterization in combination with in silico prediction suggest that while p.Asn167Tyr, p.Thr200Asn, and p.Ile406Met PAH variants have a mild impact on the protein, p.Asp229Gly, p.Gly239Ala, p.Phe263Ser, and p.Ala342Pro severely affect protein structure and function.
- Klíčová slova
- BH4, functional studies, missense variants, phenylalanine hydroxylase, phenylketonuria,
- MeSH
- biopteriny analogy a deriváty chemie genetika MeSH
- buňky Hep G2 MeSH
- fenylalaninhydroxylasa chemie genetika MeSH
- fenylketonurie genetika metabolismus patologie MeSH
- genotyp MeSH
- lidé MeSH
- missense mutace genetika MeSH
- počítačová simulace MeSH
- vztahy mezi strukturou a aktivitou MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- biopteriny MeSH
- fenylalaninhydroxylasa MeSH
- sapropterin MeSH Prohlížeč
- Klíčová slova
- ATM mutation, antigenic selection, chronic lymphocytic leukemia, short telomere, stereotyped subset,
- MeSH
- analýza přežití MeSH
- ATM protein genetika MeSH
- chronická lymfatická leukemie klasifikace genetika mortalita MeSH
- kohortové studie MeSH
- lidé MeSH
- mutace * MeSH
- sekvenční analýza DNA MeSH
- telomery ultrastruktura MeSH
- zkracování telomer * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- dopisy MeSH
- multicentrická studie MeSH
- práce podpořená grantem MeSH
- Názvy látek
- ATM protein, human MeSH Prohlížeč
- ATM protein MeSH
An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2.
- MeSH
- databáze nukleových kyselin MeSH
- databáze proteinů MeSH
- genetická variace MeSH
- genom lidský MeSH
- genomika statistika a číselné údaje MeSH
- jednonukleotidový polymorfismus * MeSH
- lidé MeSH
- software * MeSH
- výpočetní biologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Alagille syndrome may mimic biliary atresia in early infancy. Since mutations in JAG1 typical for Alagille syndrome type 1 have also been found in biliary atresia, we aimed to identify JAG1 mutations in newborns with proven biliary atresia (n = 72). Five biliary atresia patients with cholestasis, one additional characteristic feature of Alagille syndrome and ambiguous liver histology were single heterozygotes for nonsense or frameshift mutations in JAG1. No mutations were found in the remaining 67 patients. All "biliary atresia" carriers of JAG1 null mutations developed typical Alagille syndrome at the age of three years. Our data do not support association of biliary atresia with JAG1 mutations, at least in Czech patients. Rapid testing for JAG1 mutations could prevent misdiagnosis of Alagille syndrome in early infancy and improve their outcome.
- MeSH
- Alagillův syndrom diagnóza genetika MeSH
- atrézie žlučových cest genetika MeSH
- diferenciální diagnóza MeSH
- lidé MeSH
- membránové proteiny genetika MeSH
- mezibuněčné signální peptidy a proteiny genetika MeSH
- mutace MeSH
- nesmyslný kodon MeSH
- novorozenec MeSH
- posunová mutace MeSH
- protein jagged-1 MeSH
- proteiny vázající vápník genetika MeSH
- serrate-jagged proteiny MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- novorozenec MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Česká republika MeSH
- Názvy látek
- JAG1 protein, human MeSH Prohlížeč
- membránové proteiny MeSH
- mezibuněčné signální peptidy a proteiny MeSH
- nesmyslný kodon MeSH
- protein jagged-1 MeSH
- proteiny vázající vápník MeSH
- serrate-jagged proteiny MeSH