Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
- Keywords
- differentially expressed genes, functional annotation, transcriptome,
- MeSH
- Molecular Sequence Annotation methods MeSH
- Humans MeSH
- Sequence Analysis, RNA methods MeSH
- Software * MeSH
- Gene Expression Profiling methods MeSH
- Transcriptome * MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Circular RNAs play a crucial role in cell development and serve as biomarkers in many diseases. Nevertheless, the function of many circular RNAs remains unknown. This function can be inferred from sponging and silencing interactions with micro RNAs and messenger RNAs. We recently proposed a network-based circRNA functional annotation tool, circGPA. However, validation data for RNA interactions are often sparse and predicted interactions contain many false positives. To address this issue, we propose an extended algorithm named circGPAcorr, which uses expression data to weight the interactions, resulting in more precise functional annotation. To assess the significance of the results, the p-value is calculated using reduction to circGPA, a generating-polynomial-based method. We show that the problem is #P-hard, and thus computationally difficult. The circGPAcorr algorithm is tested on publicly available myelodysplastic syndromes expression data, providing gene ontology annotations that align with the literature on myelodysplastic syndromes. At the same time, we demonstrate its performance in the circRNA-disease annotation task.
- Keywords
- CircRNA, Functional annotation, Gene expression, Generating polynomial,
- Publication type
- Journal Article MeSH
Recent research has already shown that circular RNAs (circRNAs) are functional in gene expression regulation and potentially related to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. However, the function of most circRNAs remains unknown, and it is expensive and time-consuming to discover it through biological experiments. In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and subsequent miRNA-mRNA interactions. First, we construct an interaction network for a target circRNA and secondly spread the information from the network nodes with the known function to the root circRNA node. This idea itself is not new; our main contribution lies in proposing an efficient and exact deterministic procedure based on the principle of probability-generating functions to calculate the p-value of association test between a circRNA and an annotation term. We show that our publicly available algorithm is both more effective and efficient than the commonly used Monte-Carlo sampling approach that may suffer from difficult quantification of sampling convergence and subsequent sampling inefficiency. We experimentally demonstrate that the new approach is two orders of magnitude faster than the Monte-Carlo sampling, which makes summary annotation of large circRNA files feasible; this includes their reannotation after periodical interaction network updates, for example. We provide a summary annotation of a current circRNA database as one of our outputs. The proposed algorithm could be generalized towards other types of RNA in way that is straightforward.
- Keywords
- Annotation term, Circular RNA, Interaction network,
- MeSH
- Biomarkers MeSH
- Gene Regulatory Networks MeSH
- RNA, Circular * MeSH
- RNA, Messenger genetics metabolism MeSH
- MicroRNAs * genetics metabolism MeSH
- Probability MeSH
- Gene Expression Profiling methods MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Biomarkers MeSH
- RNA, Circular * MeSH
- RNA, Messenger MeSH
- MicroRNAs * MeSH
Secondary structure elements (SSEs) are inherent parts of protein structures, and their arrangement is characteristic for each protein family. Therefore, annotation of SSEs can facilitate orientation in the vast number of homologous structures which is now available for many protein families. It also provides a way to identify and annotate the key regions, like active sites and channels, and subsequently answer the key research questions, such as understanding of molecular function and its variability.This chapter introduces the concept of SSE annotation and describes the workflow for obtaining SSE annotation for the members of a selected protein family using program SecStrAnnotator.
- Keywords
- Annotation, Protein domain, Protein family, SecStrAnnotator, Secondary structure, Secondary structure assignment, Secondary structure elements, Structural alignment,
- MeSH
- Algorithms MeSH
- Amino Acid Motifs * MeSH
- Molecular Sequence Annotation methods MeSH
- Catalytic Domain genetics MeSH
- Proteins chemistry genetics MeSH
- Software MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Proteins MeSH
Accurate annotation of genomic variants in human diseases is essential to allow personalized medicine. Assessment of somatic and germline TP53 alterations has now reached the clinic and is required in several circumstances such as the identification of the most effective cancer therapy for patients with chronic lymphocytic leukemia (CLL). Here, we present Seshat, a Web service for annotating TP53 information derived from sequencing data. A flexible framework allows the use of standard file formats such as Mutation Annotation Format (MAF) or Variant Call Format (VCF), as well as common TXT files. Seshat performs accurate variant annotations using the Human Genome Variation Society (HGVS) nomenclature and the stable TP53 genomic reference provided by the Locus Reference Genomic (LRG). In addition, using the 2017 release of the UMD_TP53 database, Seshat provides multiple statistical information for each TP53 variant including database frequency, functional activity, or pathogenicity. The information is delivered in standardized output tables that minimize errors and facilitate comparison of mutational data across studies. Seshat is a beneficial tool to interpret the ever-growing TP53 sequencing data generated by multiple sequencing platforms and it is freely available via the TP53 Website, http://p53.fr or directly at http://vps338341.ovh.net/.
- Keywords
- HGVS variant nomenclature, TP53 variants, database, variant annotation,
- MeSH
- Molecular Sequence Annotation MeSH
- Databases, Genetic * MeSH
- Genetic Variation genetics MeSH
- Genomics trends MeSH
- Internet MeSH
- Humans MeSH
- Mutation MeSH
- Tumor Suppressor Protein p53 genetics MeSH
- Software * MeSH
- Computational Biology trends MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Tumor Suppressor Protein p53 MeSH
- TP53 protein, human MeSH Browser
BACKGROUND: The ATP-binding cassette (ABC) transporter superfamily is comprised predominantly of proteins which directly utilize energy from ATP to move molecules across the plasma membrane. Although they have been the subject of frequent investigation across many taxa, arthropod ABCs have been less well studied. While the manual annotation of ABC transporters has been performed in many arthropods, there has so far been no systematic comparison of the superfamily within this order using the increasing number of sequenced genomes. Furthermore, functional work on these genes is limited. RESULTS: Here, we developed a standardized pipeline to annotate ABCs from predicted proteomes and used it to perform comparative genomics on ABC families across arthropod lineages. Using Kruskal-Wallis tests and the Computational Analysis of gene Family Evolution (CAFE), we were able to observe significant expansions of the ABC-B full transporters (P-glycoproteins) in Lepidoptera and the ABC-H transporters in Hemiptera. RNA-sequencing of epithelia tissues in the Lepidoptera Helicoverpa armigera showed that the 7 P-glycoprotein paralogues differ substantially in their tissue distribution, suggesting a spatial division of labor. It also seems that functional redundancy is a feature of these transporters as RNAi knockdown showed that most transporters are dispensable with the exception of the highly conserved gene Snu, which is probably due to its role in cuticular formation. CONCLUSIONS: We have performed an annotation of the ABC superfamily across > 150 arthropod species for which good quality protein annotations exist. Our findings highlight specific expansions of ABC transporter families which suggest evolutionary adaptation. Future work will be able to use this analysis as a resource to provide a better understanding of the ABC superfamily in arthropods.
- Keywords
- ABC transporters, Arthropod, Comparative genomics, Gene family evolution, RNAi,
- MeSH
- ATP-Binding Cassette Transporters genetics MeSH
- Molecular Sequence Annotation MeSH
- Arthropods * genetics MeSH
- Genome MeSH
- Genomics MeSH
- Humans MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- ATP-Binding Cassette Transporters MeSH
- MeSH
- Genome-Wide Association Study MeSH
- Genetic Predisposition to Disease MeSH
- Genetic Loci MeSH
- Polymorphism, Single Nucleotide MeSH
- Humans MeSH
- Multiple Myeloma * genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Letter MeSH
- Research Support, N.I.H., Extramural MeSH
- Research Support, N.I.H., Intramural MeSH
The biological role of oxidized glycerophosphocholines (oxPCs) is a current topic of research importantly contributing to the understanding of health and disease. Global non-targeted metabolomics offers an interesting approach to expand current knowledge and link oxPCs to new biological functions. Although this strategy is successful, it also has some limitations which are clearly noticeable during the identification process. For this reason, clear rules related to the identification of each group of metabolites are needed. This work attempts to provide the reader with a guideline for the recognition and annotation of oxidation among phosphocholines (PCs). Using several chromatographic characteristics and spectral information from tandem mass spectrometry, rapid and reliable annotation of long and short chain oxPCs can be performed. Some of this knowledge has been implemented in the publicly available annotation tool 'CEU Mass Mediator' (CMM) for semi-automated assignment of oxidation. Additionally, this tool was supplemented with accurate monoisotopic masses of oxPCs, expanding current information in other databases. Moreover, the characterization of oxidization products of PC(16:0/20:4) known as PAPC has been performed, providing a list of accurate mass product ions and neutral losses.
- Keywords
- Annotation, Identification, LC-ESI-QTOF, Non-targeted metabolomics, Oxidized glycerophosphocholines, oxPAPC,
- MeSH
- Chromatography, Liquid MeSH
- Databases, Factual MeSH
- Diabetes Mellitus, Type 2 blood diagnosis metabolism MeSH
- Phosphatidylcholines blood chemistry metabolism MeSH
- Mass Spectrometry MeSH
- Humans MeSH
- Metabolomics * MeSH
- Molecular Structure MeSH
- Oxidation-Reduction MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Phosphatidylcholines MeSH
In this study, electrogenic microbial communities originating from a single source were multiplied using our custom-made, 96-well-plate-based microbial fuel cell (MFC) array. Developed communities operated under different pH conditions and produced currents up to 19.4 A/m3 (0.6 A/m2) within 2 days of inoculation. Microscopic observations [combined scanning electron microscopy (SEM) and energy dispersive spectroscopy (EDS)] revealed that some species present in the anodic biofilm adsorbed copper on their surface because of the bioleaching of the printed circuit board (PCB), yielding Cu2 + ions up to 600 mg/L. Beta- diversity indicates taxonomic divergence among all communities, but functional clustering is based on reactor pH. Annotated metagenomes showed the high presence of multicopper oxidases and Cu-resistance genes, as well as genes encoding aliphatic and aromatic hydrocarbon-degrading enzymes, corresponding to PCB bioleaching. Metagenome analysis revealed a high abundance of Dietzia spp., previously characterized in MFCs, which did not grow at pH 4. Binning metagenomes allowed us to identify novel species, one belonging to Actinotalea, not yet associated with electrogenicity and enriched only in the pH 7 anode. Furthermore, we identified 854 unique protein-coding genes in Actinotalea that lacked sequence homology with other metagenomes. The function of some genes was predicted with high accuracy through deep functional residue identification (DeepFRI), with several of these genes potentially related to electrogenic capacity. Our results demonstrate the feasibility of using MFC arrays for the enrichment of functional electrogenic microbial consortia and data mining for the comparative analysis of either consortia or their members.
- Keywords
- bioleaching, copper, function prediction, metagenome, microbial fuel cell, printed circuit board (PCB),
- Publication type
- Journal Article MeSH
PREMISE OF THE STUDY: Red clover (Trifolium pratense) is an important forage plant from the legume family with great importance in agronomy and livestock nourishment. Nevertheless, assembling its medium-sized genome presents a challenge, given current hardware and software possibilities. Next-generation sequencing technologies enable us to generate large amounts of sequence data at low cost. In this study, the genome assembly and red clover genome features are presented. METHODS: First, assembly software was assessed using data sets from a closely related species to find the best possible combination of assembler plus error correction program to assemble the red clover genome. The newly sequenced genome was characterized by repetitive content, number of protein-coding and nonprotein-coding genes, and gene families and functions. Genome features were also compared with those of other sequenced plant species. KEY RESULTS: Abyss with Echo correction was used for de novo assembly of the red clover genome. The presented assembly comprises ∼314.6 Mbp. In contrast to leguminous species with comparable genome sizes, the genome of T. pratense contains a larger repetitive portion and more abundant retrotransposons and DNA transposons. Overall, 47 398 protein-coding genes were annotated from 64 761 predicted genes. Comparative analysis revealed several gene families that are characteristic for T. pratense. Resistance genes, leghemoglobins, and nodule-specific cystein-rich peptides were identified and compared with other sequenced species. CONCLUSIONS: The presented red clover genomic data constitute a resource for improvement through molecular breeding and for comparison to other sequenced plant species.
- Keywords
- Fabaceae, Trifolium pratense, assessment of assembly software, de novo assembly, genome annotation, red clover,
- MeSH
- DNA, Plant analysis MeSH
- Genome, Plant * MeSH
- Chromosome Mapping MeSH
- Genes, Plant * MeSH
- Plant Proteins genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Trifolium genetics MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- Plant Proteins MeSH