Molecular reference database
Dotaz
Zobrazit nápovědu
Molecular identification of micro- and macroorganisms based on nuclear markers has revolutionized our understanding of their taxonomy, phylogeny and ecology. Today, research on the diversity of eukaryotes in global ecosystems heavily relies on nuclear ribosomal RNA (rRNA) markers. Here, we present the research community-curated reference database EUKARYOME for nuclear ribosomal 18S rRNA, internal transcribed spacer (ITS) and 28S rRNA markers for all eukaryotes, including metazoans (animals), protists, fungi and plants. It is particularly useful for the identification of arbuscular mycorrhizal fungi as it bridges the four commonly used molecular markers-ITS1, ITS2, 18S V4-V5 and 28S D1-D2 subregions. The key benefits of this database over other annotated reference sequence databases are that it is not restricted to certain taxonomic groups and it includes all rRNA markers. EUKARYOME also offers a number of reference long-read sequences that are derived from (meta)genomic and (meta)barcoding-a unique feature that can be used for taxonomic identification and chimera control of third-generation, long-read, high-throughput sequencing data. Taxonomic assignments of rRNA genes in the database are verified based on phylogenetic approaches. The reference datasets are available in multiple formats from the project homepage, http://www.eukaryome.org.
- MeSH
- databáze genetické MeSH
- databáze nukleových kyselin MeSH
- Eukaryota * genetika MeSH
- fylogeneze MeSH
- geny rRNA genetika MeSH
- RNA ribozomální 18S genetika MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- RNA ribozomální 18S MeSH
Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency. Database URL: www.prot2hg.com.
- MeSH
- anotace sekvence metody MeSH
- data mining metody MeSH
- databáze genetické * MeSH
- datové kurátorství metody MeSH
- genetická variace * MeSH
- genom lidský genetika MeSH
- genomika metody MeSH
- internet MeSH
- lidé MeSH
- proteinové domény genetika MeSH
- proteiny chemie genetika metabolismus MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny MeSH
Molecular identification is increasingly used to speed up biodiversity surveys and laboratory experiments. However, many groups of organisms cannot be reliably identified using standard databases such as GenBank or BOLD due to lack of sequenced voucher specimens identified by experts. Sometimes a large number of sequences are available, but with too many errors to allow identification. Here, we address this problem for parasitoids of Drosophila by introducing a curated open-access molecular reference database, DROP (Drosophila parasitoids). Identifying Drosophila parasitoids is challenging and poses a major impediment to realize the full potential of this model system in studies ranging from molecular mechanisms to food webs, and in biological control of Drosophila suzukii. In DROP, genetic data are linked to voucher specimens and, where possible, the voucher specimens are identified by taxonomists and vetted through direct comparison with primary type material. To initiate DROP, we curated 154 laboratory strains, 856 vouchers, 554 DNA sequences, 16 genomes, 14 transcriptomes, and six proteomes drawn from a total of 183 operational taxonomic units (OTUs): 114 described Drosophila parasitoid species and 69 provisional species. We found species richness of Drosophila parasitoids to be heavily underestimated and provide an updated taxonomic catalogue for the community. DROP offers accurate molecular identification and improves cross-referencing between individual studies that we hope will catalyse research on this diverse and fascinating model system. Our effort should also serve as an example for researchers facing similar molecular identification problems in other groups of organisms.
- Klíčová slova
- DNA sequences, biodiversity, biological control, genomes, integrative taxonomy, molecular diagnostics,
- MeSH
- biodiverzita * MeSH
- Drosophila * genetika MeSH
- potravní řetězec MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
Type B trichothecenes, which pose a serious hazard to consumer health, occur worldwide in grains. These mycotoxins are produced mainly by three different trichothecene genotypes/chemotypes: 3ADON (3-acetyldeoxynivalenol), 15ADON (15-acetyldeoxynivalenol) and NIV (nivalenol), named after these three major mycotoxin compounds. Correct identification of these genotypes is elementary for all studies relating to population surveys, fungal ecology and mycotoxicology. Trichothecene producers exhibit enormous strain-dependent chemical diversity, which may result in variation in levels of the genotype's determining toxin and in the production of low to high amounts of atypical compounds. New high-throughput DNA-sequencing technologies promise to boost the diagnostics of mycotoxin genotypes. However, this requires a reference database containing a satisfactory taxonomic sampling of sequences showing high correlation to actually produced chemotypes. We believe that one of the most pressing current challenges of such a database is the linking of molecular identification with chemical diversity of the strains, as well as other metadata. In this study, we use the Tri12 gene involved in mycotoxin biosynthesis for identification of Tri genotypes through sequence comparison. Tri12 sequences from a range of geographically diverse fungal strains comprising 22 Fusarium species were stored in the ToxGen database, which covers descriptive and up-to-date annotations such as indication on Tri genotype and chemotype of the strains, chemical diversity, information on trichothecene-inducing host, substrate or media, geographical locality, and most recent taxonomic affiliations. The present initiative bridges the gap between the demands of comprehensive studies on trichothecene producers and the existing nucleotide sequence databases, which lack toxicological and other auxiliary data. We invite researchers working in the fields of fungal taxonomy, epidemiology and mycotoxicology to join the freely available annotation effort.
- Klíčová slova
- Annotation, Chemotypes, Fusarium, Molecular identification, Trichothecene genotypes,
- Publikační typ
- časopisecké články MeSH
IRESite is an exhaustive, manually annotated non-redundant relational database focused on the IRES elements (Internal Ribosome Entry Site) and containing information not available in the primary public databases. IRES elements were originally found in eukaryotic viruses hijacking initiation of translation of their host. Later on, they were also discovered in 5'-untranslated regions of some eukaryotic mRNA molecules. Currently, IRESite presents up to 92 biologically relevant aspects of every experiment, e.g. the nature of an IRES element, its functionality/defectivity, origin, size, sequence, structure, its relative position with respect to surrounding protein coding regions, positive/negative controls used in the experiment, the reporter genes used to monitor IRES activity, the measured reporter protein yields/activities, and references to original publications as well as cross-references to other databases, and also comments from submitters and our curators. Furthermore, the site presents the known similarities to rRNA sequences as well as RNA-protein interactions. Special care is given to the annotation of promoter-like regions. The annotated data in IRESite are bound to mostly complete, full-length mRNA, and whenever possible, accompanied by original plasmid vector sequences. New data can be submitted through the publicly available web-based interface at http://www.iresite.org and are curated by a team of lab-experienced biologists.
- MeSH
- 5' nepřekládaná oblast chemie MeSH
- databáze nukleových kyselin * MeSH
- iniciace translace peptidového řetězce * MeSH
- iniciační faktory metabolismus MeSH
- internet MeSH
- messenger RNA chemie MeSH
- plazmidy chemie MeSH
- promotorové oblasti (genetika) MeSH
- regulační sekvence ribonukleových kyselin MeSH
- RNA virová chemie MeSH
- uživatelské rozhraní počítače MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- 5' nepřekládaná oblast MeSH
- iniciační faktory MeSH
- messenger RNA MeSH
- regulační sekvence ribonukleových kyselin MeSH
- RNA virová MeSH
BACKGROUND: Rapid, accurate and high-throughput identification of vector arthropods is of paramount importance in surveillance programmes that are becoming more common due to the changing geographic occurrence and extent of many arthropod-borne diseases. Protein profiling by MALDI-TOF mass spectrometry fulfils these requirements for identification, and reference databases have recently been established for several vector taxa, mostly with specimens from laboratory colonies. METHODS: We established and validated a reference database containing 20 phlebotomine sand fly (Diptera: Psychodidae, Phlebotominae) species by using specimens from colonies or field-collections that had been stored for various periods of time. RESULTS: Identical biomarker mass patterns ('superspectra') were obtained with colony- or field-derived specimens of the same species. In the validation study, high quality spectra (i.e. more than 30 evaluable masses) were obtained with all fresh insects from colonies, and with 55/59 insects deep-frozen (liquid nitrogen/-80 °C) for up to 25 years. In contrast, only 36/52 specimens stored in ethanol could be identified. This resulted in an overall sensitivity of 87 % (140/161); specificity was 100 %. Duration of storage impaired data counts in the high mass range, and thus cluster analyses of closely related specimens might reflect their storage conditions rather than phenotypic distinctness. A major drawback of MALDI-TOF MS is the restricted availability of in-house databases and the fact that mass spectrometers from 2 companies (Bruker, Shimadzu) are widely being used. We have analysed fingerprints of phlebotomine sand flies obtained by automatic routine procedure on a Bruker instrument by using our database and the software established on a Shimadzu system. The sensitivity with 312 specimens from 8 sand fly species from laboratory colonies when evaluating only high quality spectra was 98.3 %; the specificity was 100 %. The corresponding diagnostic values with 55 field-collected specimens from 4 species were 94.7 % and 97.4 %, respectively. CONCLUSIONS: A centralized high-quality database (created by expert taxonomists and experienced users of mass spectrometers) that is easily amenable to customer-oriented identification services is a highly desirable resource. As shown in the present work, spectra obtained from different specimens with different instruments can be analysed using a centralized database, which should be available in the near future via an online platform in a cost-efficient manner.
- MeSH
- entomologie metody MeSH
- hmyzí proteiny analýza MeSH
- molekulární sekvence - údaje MeSH
- Psychodidae chemie klasifikace MeSH
- respirační komplex IV genetika MeSH
- sekvenční analýza DNA MeSH
- senzitivita a specificita MeSH
- spektrometrie hmotnostní - ionizace laserem za účasti matrice metody MeSH
- teplota MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- validační studie MeSH
- Názvy látek
- hmyzí proteiny MeSH
- respirační komplex IV MeSH
Molecular techniques like metabarcoding, while promising for exploring diversity of communities, are often impeded by the lack of reference DNA sequences available for taxonomic annotation. Our study explores the benefits of combining targeted DNA barcoding and morphological taxonomy to improve metabarcoding efficiency, using beach meiofauna as a case study. Beaches are globally important ecosystems and are inhabited by meiofauna, microscopic animals living in the interstitial space between the sand grains, which play a key role in coastal biodiversity and ecosystem dynamics. However, research on meiofauna faces challenges due to limited taxonomic expertise and sparse sampling. We generated 775 new cytochrome c oxidase I DNA barcodes from meiofauna specimens collected along the Netherlands' west coast and combined them with the NCBI GenBank database. We analysed alpha and beta diversity in 561 metabarcoding samples from 24 North Sea beaches, a region extensively studied for meiofauna, using both the enriched reference database and the NCBI database without the additional reference barcodes. Our results show a 2.5-fold increase in sequence annotation and a doubling of species-level Operational Taxonomic Units (OTUs) identification when annotating the metabarcoding data with the enhanced database. Additionally, our analyses revealed a bell-shaped curve of OTU richness across the intertidal zone, aligning more closely with morphological analysis patterns, and more defined community dissimilarity patterns between supralittoral and intertidal sites. Our research highlights the importance of expanding molecular reference databases and combining morphological taxonomy with molecular techniques for biodiversity assessments, ultimately improving our understanding of coastal ecosystems.
- Klíčová slova
- DNA barcoding, Molecular reference database, community ecology, invertebrates,
- MeSH
- bezobratlí genetika klasifikace MeSH
- biodiverzita MeSH
- ekosystém MeSH
- koupací pláže MeSH
- metagenomika metody MeSH
- respirační komplex IV * genetika MeSH
- taxonomické DNA čárové kódování * metody MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Nizozemsko MeSH
- Severní moře MeSH
- Názvy látek
- respirační komplex IV * MeSH
Twelve Y-chromosomal short tandem repeats (Y-STR) (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a, DYS385b, DYS437, DYS438, and DYS439) included in the PowerPlex Y Kit (Promega Corporation, Madison, USA) were studied for 1750 unrelated males living in 14 regions of the Czech Republic. A total of 1148 different haplotypes were found. The overall haplotype diversity (HD) was determined as 0.998. Analysis of Molecular Variance (AMOVA) reveals non-significant distances between regions concerning their haplotype distribution, thus allowing to use the whole sample as a representative reference database of the Czech Republic. Median network analysis shows a remarkable bipartite composition of the Czech haplotypes, falling in distinct clusters with Eastern and Western European roots.
- MeSH
- databáze nukleových kyselin * MeSH
- DNA fingerprinting MeSH
- haplotypy * MeSH
- lidé MeSH
- lidský chromozom Y * MeSH
- polymerázová řetězová reakce MeSH
- populační genetika * MeSH
- tandemové repetitivní sekvence * MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Česká republika MeSH
Throughout the years, DNA barcoding has gained in importance in forensic entomology as it leads to fast and reliable species determination. High-quality results, however, can only be achieved with a comprehensive DNA barcode reference database at hand. In collaboration with the Bavarian State Criminal Police Office, we have initiated at the Bavarian State Collection of Zoology the establishment of a reference library containing arthropods of potential forensic relevance to be used for DNA barcoding applications. CO1-5P' DNA barcode sequences of hundreds of arthropods were obtained via DNA extraction, PCR and Sanger Sequencing, leading to the establishment of a database containing 502 high-quality sequences which provide coverage for 88 arthropod species. Furthermore, we demonstrate an application example of this library using it as a backbone to a high throughput sequencing analysis of arthropod bulk samples collected from human corpses, which enabled the identification of 31 different arthropod Barcode Index Numbers.
- Klíčová slova
- Cytochrome C Oxidase 1, DNA barcoding, DNA reference library, bulk sample analysis, forensic entomology, forensic science, high throughput sequencing, next generation sequencing,
- MeSH
- členovci genetika MeSH
- databáze nukleových kyselin * MeSH
- entomologie MeSH
- polymerázová řetězová reakce MeSH
- respirační komplex IV genetika MeSH
- sekvenční analýza DNA MeSH
- soudní vědy * MeSH
- taxonomické DNA čárové kódování * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- respirační komplex IV MeSH
Following the discovery of serious errors in the structure of biomacromolecules, structure validation has become a key topic of research, especially for ligands and non-standard residues. ValidatorDB (freely available at http://ncbr.muni.cz/ValidatorDB) offers a new step in this direction, in the form of a database of validation results for all ligands and non-standard residues from the Protein Data Bank (all molecules with seven or more heavy atoms). Model molecules from the wwPDB Chemical Component Dictionary are used as reference during validation. ValidatorDB covers the main aspects of validation of annotation, and additionally introduces several useful validation analyses. The most significant is the classification of chirality errors, allowing the user to distinguish between serious issues and minor inconsistencies. Other such analyses are able to report, for example, completely erroneous ligands, alternate conformations or complete identity with the model molecules. All results are systematically classified into categories, and statistical evaluations are performed. In addition to detailed validation reports for each molecule, ValidatorDB provides summaries of the validation results for the entire PDB, for sets of molecules sharing the same annotation (three-letter code) or the same PDB entry, and for user-defined selections of annotations or PDB entries.
- MeSH
- aminokyseliny chemie MeSH
- anotace sekvence MeSH
- databáze proteinů * MeSH
- internet MeSH
- konformace proteinů MeSH
- ligandy MeSH
- molekulární modely MeSH
- proteiny chemie MeSH
- reprodukovatelnost výsledků MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- aminokyseliny MeSH
- ligandy MeSH
- proteiny MeSH