Molecular identification of micro- and macroorganisms based on nuclear markers has revolutionized our understanding of their taxonomy, phylogeny and ecology. Today, research on the diversity of eukaryotes in global ecosystems heavily relies on nuclear ribosomal RNA (rRNA) markers. Here, we present the research community-curated reference database EUKARYOME for nuclear ribosomal 18S rRNA, internal transcribed spacer (ITS) and 28S rRNA markers for all eukaryotes, including metazoans (animals), protists, fungi and plants. It is particularly useful for the identification of arbuscular mycorrhizal fungi as it bridges the four commonly used molecular markers-ITS1, ITS2, 18S V4-V5 and 28S D1-D2 subregions. The key benefits of this database over other annotated reference sequence databases are that it is not restricted to certain taxonomic groups and it includes all rRNA markers. EUKARYOME also offers a number of reference long-read sequences that are derived from (meta)genomic and (meta)barcoding-a unique feature that can be used for taxonomic identification and chimera control of third-generation, long-read, high-throughput sequencing data. Taxonomic assignments of rRNA genes in the database are verified based on phylogenetic approaches. The reference datasets are available in multiple formats from the project homepage, http://www.eukaryome.org.
- MeSH
- databáze genetické MeSH
- databáze nukleových kyselin MeSH
- Eukaryota * genetika MeSH
- fylogeneze MeSH
- geny rRNA genetika MeSH
- RNA ribozomální 18S genetika MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- MeSH
- antropologie metody MeSH
- databáze nukleových kyselin MeSH
- genom lidský genetika MeSH
- lidé MeSH
- metadata MeSH
- mitochondriální DNA analýza genetika MeSH
- starobylá DNA * analýza izolace a purifikace MeSH
- vývoj člověka MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- přehledy MeSH
Non-coding RNAs (ncRNA) are essential for all life, and their functions often depend on their secondary (2D) and tertiary structure. Despite the abundance of software for the visualisation of ncRNAs, few automatically generate consistent and recognisable 2D layouts, which makes it challenging for users to construct, compare and analyse structures. Here, we present R2DT, a method for predicting and visualising a wide range of RNA structures in standardised layouts. R2DT is based on a library of 3,647 templates representing the majority of known structured RNAs. R2DT has been applied to ncRNA sequences from the RNAcentral database and produced >13 million diagrams, creating the world's largest RNA 2D structure dataset. The software is amenable to community expansion, and is freely available at https://github.com/rnacentral/R2DT and a web server is found at https://rnacentral.org/r2dt .
- MeSH
- databáze nukleových kyselin MeSH
- konformace nukleové kyseliny MeSH
- nekódující RNA chemie MeSH
- reprodukovatelnost výsledků MeSH
- RNA chemie MeSH
- sekvenční analýza RNA MeSH
- software MeSH
- výpočetní biologie metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Intramural MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
G-quadruplexes contribute to the regulation of key molecular processes. Their utilization for antiviral therapy is an emerging field of contemporary research. Here we present comprehensive analyses of the presence and localization of putative G-quadruplex forming sequences (PQS) in all viral genomes currently available in the NCBI database (including subviral agents). The G4Hunter algorithm was applied to a pool of 11,000 accessible viral genomes representing 350 Mbp in total. PQS frequencies differ across evolutionary groups of viruses, and are enriched in repeats, replication origins, 5'UTRs and 3'UTRs. Importantly, PQS presence and localization is connected to viral lifecycles and corresponds to the type of viral infection rather than to nucleic acid type; while viruses routinely causing persistent infections in Metazoa hosts are enriched for PQS, viruses causing acute infections are significantly depleted for PQS. The unique localization of PQS identifies the importance of G-quadruplex-based regulation of viral replication and life cycle, providing a tool for potential therapeutic targeting.
- MeSH
- databáze nukleových kyselin * MeSH
- DNA virů genetika metabolismus MeSH
- G-kvadruplexy * MeSH
- genom virový * MeSH
- lidé MeSH
- virové nemoci * genetika metabolismus MeSH
- viry * genetika metabolismus MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Deregulation of microRNA (miRNA) expression plays a critical role in the transition from a physiological to a pathological state. The accurate miRNA promoter identification in multiple cell types is a fundamental endeavor towards understanding and characterizing the underlying mechanisms of both physiological as well as pathological conditions. DIANA-miRGen v4 (www.microrna.gr/mirgenv4) provides cell type specific miRNA transcription start sites (TSSs) for over 1500 miRNAs retrieved from the analysis of >1000 cap analysis of gene expression (CAGE) samples corresponding to 133 tissues, cell lines and primary cells available in FANTOM repository. MiRNA TSS locations were associated with transcription factor binding site (TFBSs) annotation, for >280 TFs, derived from analyzing the majority of ENCODE ChIP-Seq datasets. For the first time, clusters of cell types having common miRNA TSSs are characterized and provided through a user friendly interface with multiple layers of customization. DIANA-miRGen v4 significantly improves our understanding of miRNA biogenesis regulation at the transcriptional level by providing a unique integration of high-quality annotations for hundreds of cell specific miRNA promoters with experimentally derived TFBSs.
- MeSH
- anotace sekvence MeSH
- buněčné linie MeSH
- databáze nukleových kyselin * MeSH
- genetická transkripce MeSH
- genom * MeSH
- internet MeSH
- lidé MeSH
- mikro RNA genetika metabolismus MeSH
- počátek transkripce MeSH
- primární buněčná kultura MeSH
- promotorové oblasti (genetika) * MeSH
- sekvence nukleotidů MeSH
- software * MeSH
- transkripční faktory genetika metabolismus MeSH
- vazba proteinů MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The genomic characteristics of human cytomegalovirus (HCMV) strains sequenced directly from clinical pathology samples were investigated, focusing on variation, multiple-strain infection, recombination, and gene loss. A total of 207 datasets generated in this and previous studies using target enrichment and high-throughput sequencing were analyzed, in the process enabling the determination of genome sequences for 91 strains. Key findings were that (i) it is important to monitor the quality of sequencing libraries in investigating variation; (ii) many recombinant strains have been transmitted during HCMV evolution, and some have apparently survived for thousands of years without further recombination; (iii) mutants with nonfunctional genes (pseudogenes) have been circulating and recombining for long periods and can cause congenital infection and resulting clinical sequelae; and (iv) intrahost variation in single-strain infections is much less than that in multiple-strain infections. Future population-based studies are likely to continue illuminating the evolution, epidemiology, and pathogenesis of HCMV.
- MeSH
- cytomegalovirové infekce virologie MeSH
- Cytomegalovirus genetika MeSH
- databáze nukleových kyselin MeSH
- datové soubory jako téma MeSH
- DNA virů genetika MeSH
- genetická variace MeSH
- genom virový * genetika MeSH
- genotyp MeSH
- lidé MeSH
- molekulární evoluce MeSH
- mutace MeSH
- rekombinace genetická * MeSH
- sekvence nukleotidů * MeSH
- sekvenční analýza DNA MeSH
- sekvenování celého genomu MeSH
- virové geny MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Throughout the years, DNA barcoding has gained in importance in forensic entomology as it leads to fast and reliable species determination. High-quality results, however, can only be achieved with a comprehensive DNA barcode reference database at hand. In collaboration with the Bavarian State Criminal Police Office, we have initiated at the Bavarian State Collection of Zoology the establishment of a reference library containing arthropods of potential forensic relevance to be used for DNA barcoding applications. CO1-5P' DNA barcode sequences of hundreds of arthropods were obtained via DNA extraction, PCR and Sanger Sequencing, leading to the establishment of a database containing 502 high-quality sequences which provide coverage for 88 arthropod species. Furthermore, we demonstrate an application example of this library using it as a backbone to a high throughput sequencing analysis of arthropod bulk samples collected from human corpses, which enabled the identification of 31 different arthropod Barcode Index Numbers.
- MeSH
- členovci genetika MeSH
- databáze nukleových kyselin * MeSH
- entomologie MeSH
- polymerázová řetězová reakce MeSH
- respirační komplex IV genetika MeSH
- sekvenční analýza DNA MeSH
- soudní vědy * MeSH
- taxonomické DNA čárové kódování * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
The concept of operational taxonomic units (OTUs), which constructs "mathematically" defined taxa, is widely accepted and applied to describe bacterial communities using amplicon sequencing of 16S rRNA gene. OTUs are often used to infer functional traits since they are considered to fairly represent of community members. However, the link between molecular taxa, real taxa, and OTUs seems to be much more complicated. Strains of the same bacterial species (ideally belonging to the same OTU) typically only share some genes (the core genome), while other genes are strain-specific and unique. It is thus unclear to what extent are important functional traits homogeneous within an OTU and how correctly can functional traits be inferred for individual OTU members. Here, we have tested in silico the similarity of all genes and, more specifically, the set of genes encoding for glycoside hydrolases (GH) in bacterial genomes that belong to the same OTU. Genome similarity varied among OTUs, but as many as 5-78% of genes were not shared between the two bacterial genomes in the pair. The complement of GH families (the presence of gene families and the number of genes per family) differed in 95% of OTUs. In average, 43% of GH families either differed in gene counts or were present in one genome and absent in the other. These results show a serious limitation of the OTU-based approaches when used to infer the functional traits of bacterial communities and open the questions how to link environmental sequencing data and microbial functions.
- MeSH
- Bacteria klasifikace genetika MeSH
- bakteriální geny genetika MeSH
- databáze nukleových kyselin MeSH
- DNA bakterií genetika MeSH
- fylogeneze MeSH
- genetická variace MeSH
- genom bakteriální genetika MeSH
- glykosidhydrolasy genetika MeSH
- metagenomika * MeSH
- mikrobiota MeSH
- RNA ribozomální 16S genetika MeSH
- sekvenční analýza DNA MeSH
- Publikační typ
- časopisecké články MeSH
Secondary data structure of RNA molecules provides insights into the identity and function of RNAs. With RNAs readily sequenced, the question of their structural characterization is increasingly important. However, RNA structure is difficult to acquire. Its experimental identification is extremely technically demanding, while computational prediction is not accurate enough, especially for large structures of long sequences. We address this difficult situation with rPredictorDB, a predictive database of RNA secondary structures that aims to form a middle ground between experimentally identified structures in PDB and predicted consensus secondary structures in Rfam. The database contains individual secondary structures predicted using a tool for template-based prediction of RNA secondary structure for the homologs of the RNA families with at least one homolog with experimentally solved structure. Experimentally identified structures are used as the structural templates and thus the prediction has higher reliability than de novo predictions in Rfam. The sequences are downloaded from public resources. So far rPredictorDB covers 7365 RNAs with their secondary structures. Plots of the secondary structures use the Traveler package for readable display of RNAs with long sequences and complex structures, such as ribosomal RNAs. The RNAs in the output of rPredictorDB are extensively annotated and can be viewed, browsed, searched and downloaded according to taxonomic, sequence and structure data. Additionally, structure of user-provided sequences can be predicted using the templates stored in rPredictorDB.
A collaborative effort was carried out by the Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) to promote knowledge exchange between associate laboratories interested in the implementation of indel-based methodologies and build allele frequency databases of 38 indels for forensic applications. These databases include populations from different countries that are relevant for identification and kinship investigations undertaken by the participating laboratories. Before compiling population data, participants were asked to type the 38 indels in blind samples from annual GHEP-ISFG proficiency tests, using an amplification protocol previously described. Only laboratories that reported correct results contributed with population data to this study. A total of 5839 samples were genotyped from 45 different populations from Africa, America, East Asia, Europe and Middle East. Population differentiation analysis showed significant differences between most populations studied from Africa and America, as well as between two Asian populations from China and East Timor. Low FST values were detected among most European populations. Overall diversities and parameters of forensic efficiency were high in populations from all continents.
- MeSH
- databáze nukleových kyselin MeSH
- DNA fingerprinting MeSH
- etnicita genetika MeSH
- frekvence genu MeSH
- genotyp MeSH
- jednonukleotidový polymorfismus * MeSH
- laboratoře statistika a číselné údaje MeSH
- lidé MeSH
- mikrosatelitní repetice MeSH
- mutace INDEL * MeSH
- populační genetika * MeSH
- rasové skupiny genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH