1 online zdroj
- MeSH
- Databases, Genetic MeSH
- Genomics * MeSH
- Publication type
- Periodical MeSH
- Conspectus
- Obecná genetika. Obecná cytogenetika. Evoluce
- NML Fields
- lékařská informatika
- genetika, lékařská genetika
An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2.
- MeSH
- Databases, Nucleic Acid MeSH
- Databases, Protein MeSH
- Genetic Variation MeSH
- Genome, Human MeSH
- Genomics statistics & numerical data MeSH
- Polymorphism, Single Nucleotide * MeSH
- Humans MeSH
- Software * MeSH
- Computational Biology MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Genetic diagnosis of rare diseases requires accurate identification and interpretation of genomic variants. Clinical and molecular scientists from 37 expert centers across Europe created the Solve-Rare Diseases Consortium (Solve-RD) resource, encompassing clinical, pedigree and genomic rare-disease data (94.5% exomes, 5.5% genomes), and performed systematic reanalysis for 6,447 individuals (3,592 male, 2,855 female) with previously undiagnosed rare diseases from 6,004 families. We established a collaborative, two-level expert review infrastructure that allowed a genetic diagnosis in 506 (8.4%) families. Of 552 disease-causing variants identified, 464 (84.1%) were single-nucleotide variants or short insertions/deletions. These variants were either located in recently published novel disease genes (n = 67), recently reclassified in ClinVar (n = 187) or reclassified by consensus expert decision within Solve-RD (n = 210). Bespoke bioinformatics analyses identified the remaining 15.9% of causative variants (n = 88). Ad hoc expert review, parallel to the systematic reanalysis, diagnosed 249 (4.1%) additional families for an overall diagnostic yield of 12.6%. The infrastructure and collaborative networks set up by Solve-RD can serve as a blueprint for future further scalable international efforts. The resource is open to the global rare-disease community, allowing phenotype, variant and gene queries, as well as genome-wide discoveries.
- MeSH
- Databases, Genetic MeSH
- Exome genetics MeSH
- Genome, Human genetics MeSH
- Genomics * methods MeSH
- Humans MeSH
- Pedigree MeSH
- Computational Biology methods MeSH
- Rare Diseases * genetics diagnosis MeSH
- Check Tag
- Humans MeSH
- Male MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Geographicals
- Europe MeSH
BACKGROUND: Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition. RESULTS: Here we present a collection of curated and easily accessible sequence classification datasets in the field of genomics. The proposed collection is based on a combination of novel datasets constructed from the mining of publicly available databases and existing datasets obtained from published articles. The collection currently contains nine datasets that focus on regulatory elements (promoters, enhancers, open chromatin region) from three model organisms: human, mouse, and roundworm. A simple convolution neural network is also included in a repository and can be used as a baseline model. Benchmarks and the baseline model are distributed as the Python package 'genomic-benchmarks', and the code is available at https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks . CONCLUSIONS: Deep learning techniques revolutionized many biological fields but mainly thanks to the carefully curated benchmarks. For the field of Genomics, we propose a collection of benchmark datasets for the classification of genomic sequences with an interface for the most commonly used deep learning libraries, implementation of the simple neural network and a training framework that can be used as a starting point for future research. The main aim of this effort is to create a repository for shared datasets that will make machine learning for genomics more comparable and reproducible while reducing the overhead of researchers who want to enter the field, leading to healthy competition and new discoveries.
Ancient mitochondrial DNA is used for tracing human past demographic events due to its population-level variability. The number of published ancient mitochondrial genomes has increased in recent years, alongside with the development of high-throughput sequencing and capture enrichment methods. Here, we present AmtDB, the first database of ancient human mitochondrial genomes. Release version contains 1107 hand-curated ancient samples, freely accessible for download, together with the individual descriptors, including geographic location, radiocarbon dating, and archaeological culture affiliation. The database also features an interactive map for sample location visualization. AmtDB is a key platform for ancient population genetic studies and is available at https://amtdb.org.
- MeSH
- Databases, Genetic * MeSH
- Genome, Mitochondrial * MeSH
- Genomics * methods MeSH
- Web Browser MeSH
- Humans MeSH
- Mitochondria genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
- MeSH
- Algorithms MeSH
- Databases, Genetic MeSH
- Phenotype * MeSH
- Genomics * methods MeSH
- Humans MeSH
- Software * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Neurodegeneration with brain iron accumulation (NBIA) are a group of clinically and genetically heterogeneous diseases characterized by iron overload in basal ganglia and progressive neurodegeneration. Little is known about the epidemiology of NBIA disorders. In the absence of large-scale population-based studies, obtaining reliable epidemiological data requires innovative approaches. METHODS: All pathogenic variants were collected from the 13 genes associated with autosomal recessive NBIA (PLA2G6, PANK2, COASY, ATP13A2, CP, AP4M1, FA2H, CRAT, SCP2, C19orf12, DCAF17, GTPBP2, REPS1). The allele frequencies of these disease-causing variants were assessed in exome/genome collections: the Genome Aggregation Database (gnomAD) and our in-house database. Lifetime risks were calculated from the sum of allele frequencies in the respective genes under assumption of Hardy-Weinberg equilibrium. FINDINGS: The combined estimated lifetime risk of all 13 investigated NBIA disorders is 0.88 (95% confidence interval 0.70-1.10) per 100,000 based on the global gnomAD dataset (n = 282,912 alleles), 0.92 (0.65-1.29) per 100,000 in the European gnomAD dataset (n = 129,206), and 0.90 (0.48-1.62) per 100,000 in our in-house database (n = 44,324). Individually, the highest lifetime risks (>0.15 per 100,000) are found for disorders caused by variants in PLA2G6, PANK2 and COASY. INTERPRETATION: This population-genetic estimation on lifetime risks of recessive NBIA disorders reveals frequencies far exceeding previous population-based numbers. Importantly, our approach represents lifetime risks from conception, thus including prenatal deaths. Understanding the true lifetime risk of NBIA disorders is important in estimating disease burden, allocating resources and targeting specific interventions. FUNDING: This work was carried out in the framework of TIRCON ("Treat Iron-Related Childhood-Onset Neurodegeneration").
- MeSH
- Databases, Genetic MeSH
- Child MeSH
- Nuclear Proteins MeSH
- Ubiquitin-Protein Ligase Complexes MeSH
- Humans MeSH
- Mitochondrial Proteins genetics MeSH
- Brain pathology MeSH
- Neuroaxonal Dystrophies * epidemiology genetics pathology MeSH
- Neurodegenerative Diseases * epidemiology genetics pathology MeSH
- Iron Metabolism Disorders * genetics pathology MeSH
- Calcium-Binding Proteins MeSH
- Check Tag
- Child MeSH
- Humans MeSH
- Publication type
- Journal Article MeSH
Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency.Database URL: www.prot2hg.com.
- MeSH
- Molecular Sequence Annotation methods MeSH
- Data Mining methods MeSH
- Databases, Genetic * MeSH
- Data Curation methods MeSH
- Genetic Variation * MeSH
- Genome, Human genetics MeSH
- Genomics methods MeSH
- Internet MeSH
- Humans MeSH
- Protein Domains genetics MeSH
- Proteins chemistry genetics metabolism MeSH
- Computational Biology methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
INTRODUCTION: The study presents the results of the genomic surveillance of invasive meningococcal disease (IMD) in the Czech Republic for the period of 2015-2017. MATERIAL AND METHODS: The study set includes all available IMD isolates recovered in the Czech Republic and referred to the National Reference Laboratory for Meningococcal Infections in 2015-2017, a total of 89 Neissseria meningitidis isolates-from 2015 (n = 20), 2016 (n = 27), and from 2017 (n = 42). All isolates were studied by whole genome sequencing (WGS). RESULTS: Serogroup B (MenB) was the most common, followed by serogroups C, W, and Y. Altogether 17 clonal complexes were identified, the most common of which was hypervirulent complex cc11, followed by complexes cc32, cc41/44, cc269, and cc865. Over the three study years, hypervirulent cc11 (MenC) showed an upward trend. The WGS method showed two clearly differentiated clusters of N. meningitidis C: P1.5,2:F3-3:ST-11 (cc11). The first cluster is represented by nine isolates, all of which are from 2017. The second cluster consisted of five isolates from 2016 and eight isolates from 2017. Their genetic discordance is illustrated by the changing nadA allele and subsequently by the variance in BAST type. Clonal complex cc269 (MenB) also increased over the time frame. WGS identified the presence of MenB vaccine antigen genes in all B and non-B isolates of N. meningitidis. Altogether 49 different Bexsero antigen sequence types (BAST) were identified and 10 combinations of these have not been previously described in the PubMLST database. CONCLUSIONS: The genomic surveillance of IMD in the Czech Republic provides data needed to update immunisation guidelines for this disease. WGS showed a higher discrimination power and provided more accurate data on molecular characteristics and genetic relationships among invasive N. meningitidis isolates.
- MeSH
- Antigens, Bacterial genetics MeSH
- Genome, Bacterial genetics MeSH
- Genomics MeSH
- Humans MeSH
- Meningococcal Infections epidemiology genetics microbiology MeSH
- Neisseria meningitidis, Serogroup B genetics pathogenicity MeSH
- Neisseria meningitidis genetics pathogenicity MeSH
- Whole Genome Sequencing MeSH
- Vaccination MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Czech Republic MeSH