Recent research has already shown that circular RNAs (circRNAs) are functional in gene expression regulation and potentially related to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. However, the function of most circRNAs remains unknown, and it is expensive and time-consuming to discover it through biological experiments. In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and subsequent miRNA-mRNA interactions. First, we construct an interaction network for a target circRNA and secondly spread the information from the network nodes with the known function to the root circRNA node. This idea itself is not new; our main contribution lies in proposing an efficient and exact deterministic procedure based on the principle of probability-generating functions to calculate the p-value of association test between a circRNA and an annotation term. We show that our publicly available algorithm is both more effective and efficient than the commonly used Monte-Carlo sampling approach that may suffer from difficult quantification of sampling convergence and subsequent sampling inefficiency. We experimentally demonstrate that the new approach is two orders of magnitude faster than the Monte-Carlo sampling, which makes summary annotation of large circRNA files feasible; this includes their reannotation after periodical interaction network updates, for example. We provide a summary annotation of a current circRNA database as one of our outputs. The proposed algorithm could be generalized towards other types of RNA in way that is straightforward.
BACKGROUND: The ATP-binding cassette (ABC) transporter superfamily is comprised predominantly of proteins which directly utilize energy from ATP to move molecules across the plasma membrane. Although they have been the subject of frequent investigation across many taxa, arthropod ABCs have been less well studied. While the manual annotation of ABC transporters has been performed in many arthropods, there has so far been no systematic comparison of the superfamily within this order using the increasing number of sequenced genomes. Furthermore, functional work on these genes is limited. RESULTS: Here, we developed a standardized pipeline to annotate ABCs from predicted proteomes and used it to perform comparative genomics on ABC families across arthropod lineages. Using Kruskal-Wallis tests and the Computational Analysis of gene Family Evolution (CAFE), we were able to observe significant expansions of the ABC-B full transporters (P-glycoproteins) in Lepidoptera and the ABC-H transporters in Hemiptera. RNA-sequencing of epithelia tissues in the Lepidoptera Helicoverpa armigera showed that the 7 P-glycoprotein paralogues differ substantially in their tissue distribution, suggesting a spatial division of labor. It also seems that functional redundancy is a feature of these transporters as RNAi knockdown showed that most transporters are dispensable with the exception of the highly conserved gene Snu, which is probably due to its role in cuticular formation. CONCLUSIONS: We have performed an annotation of the ABC superfamily across > 150 arthropod species for which good quality protein annotations exist. Our findings highlight specific expansions of ABC transporter families which suggest evolutionary adaptation. Future work will be able to use this analysis as a resource to provide a better understanding of the ABC superfamily in arthropods.
Accurate annotation of genomic variants in human diseases is essential to allow personalized medicine. Assessment of somatic and germline TP53 alterations has now reached the clinic and is required in several circumstances such as the identification of the most effective cancer therapy for patients with chronic lymphocytic leukemia (CLL). Here, we present Seshat, a Web service for annotating TP53 information derived from sequencing data. A flexible framework allows the use of standard file formats such as Mutation Annotation Format (MAF) or Variant Call Format (VCF), as well as common TXT files. Seshat performs accurate variant annotations using the Human Genome Variation Society (HGVS) nomenclature and the stable TP53 genomic reference provided by the Locus Reference Genomic (LRG). In addition, using the 2017 release of the UMD_TP53 database, Seshat provides multiple statistical information for each TP53 variant including database frequency, functional activity, or pathogenicity. The information is delivered in standardized output tables that minimize errors and facilitate comparison of mutational data across studies. Seshat is a beneficial tool to interpret the ever-growing TP53 sequencing data generated by multiple sequencing platforms and it is freely available via the TP53 Website, http://p53.fr or directly at http://vps338341.ovh.net/.
- MeSH
- Molecular Sequence Annotation MeSH
- Databases, Genetic * MeSH
- Genetic Variation genetics MeSH
- Genomics trends MeSH
- Internet MeSH
- Humans MeSH
- Mutation MeSH
- Tumor Suppressor Protein p53 genetics MeSH
- Software * MeSH
- Computational Biology trends MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
Secondary structure elements (SSEs) are inherent parts of protein structures, and their arrangement is characteristic for each protein family. Therefore, annotation of SSEs can facilitate orientation in the vast number of homologous structures which is now available for many protein families. It also provides a way to identify and annotate the key regions, like active sites and channels, and subsequently answer the key research questions, such as understanding of molecular function and its variability.This chapter introduces the concept of SSE annotation and describes the workflow for obtaining SSE annotation for the members of a selected protein family using program SecStrAnnotator.
- MeSH
- Genome-Wide Association Study MeSH
- Genetic Predisposition to Disease MeSH
- Genetic Loci MeSH
- Polymorphism, Single Nucleotide MeSH
- Humans MeSH
- Multiple Myeloma * genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Letter MeSH
BACKGROUND: Despite increasing interest in γδ T cells and their non-classical behaviour, most studies focus on animals with low numbers of circulating γδ T cells, such as mice and humans. Arguably, γδ T cell functions might be more prominent in chickens where these cells form a higher proportion of the circulatory T cell compartment. The TCR repertoire defines different subsets of γδ T cells, and such analysis is facilitated by well-annotated TCR loci. γδ T cells are considered at the cusp of innate and adaptive immunity but most functions have been identified in γδ low species. A deeper understanding of TCR repertoire biology in γδ high and γδ low animals is critical for defining the evolution of the function of γδ T cells. Repertoire dynamics will reveal populations that can be classified as innate-like or adaptive-like as well as those that straddle this definition. RESULTS: Here, a recent discrepancy in the structure of the chicken TCR gamma locus is resolved, demonstrating that tandem duplication events have shaped the evolution of this locus. Importantly, repertoire sequencing revealed large differences in the usage of individual TRGV genes, a pattern conserved across multiple tissues, including thymus, spleen and the gut. A single TRGV gene, TRGV3.3, with a highly diverse private CDR3 repertoire dominated every tissue in all birds. TRGV usage patterns were partly explained by the TRGV-associated recombination signal sequences. Public CDR3 clonotypes represented varying proportions of the repertoire of TCRs utilising different TRGVs, with one TRGV dominated by super-public clones present in all birds. CONCLUSIONS: The application of repertoire analysis enabled functional annotation of the TCRG locus in a species with a high circulating γδ phenotype. This revealed variable usage of TCRGV genes across multiple tissues, a pattern quite different to that found in γδ low species (human and mouse). Defining the repertoire biology of avian γδ T cells will be key to understanding the evolution and functional diversity of these enigmatic lymphocytes in an animal that is numerically more reliant on them. Practically, this will reveal novel ways in which these cells can be exploited to improve health in medical and veterinary contexts.
- MeSH
- Genome * MeSH
- Genomics MeSH
- Chickens * genetics MeSH
- Receptors, Antigen, T-Cell, gamma-delta * genetics MeSH
- T-Lymphocytes MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.
- MeSH
- Molecular Sequence Annotation MeSH
- COVID-19 epidemiology prevention & control virology MeSH
- Databases, Protein statistics & numerical data MeSH
- Epidemics MeSH
- Internet MeSH
- Humans MeSH
- Protein Domains * MeSH
- Proteins chemistry genetics metabolism MeSH
- SARS-CoV-2 genetics metabolism physiology MeSH
- Amino Acid Sequence MeSH
- Sequence Analysis, Protein methods MeSH
- Sequence Homology, Amino Acid MeSH
- Viral Proteins chemistry genetics metabolism MeSH
- Computational Biology methods statistics & numerical data MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Objectives: Work aims to create a portable tool with a decision support, providing relevant methods for purposes of physical activity evaluation in real-time. Methods: We have utilized accelerometer equipped ez430Chronos watch in conjunction with a preconfigured RaspberryPi-based setup. Wireless transmission of accelerometer data into a running web application instance, which served as a user frontend, is provided through the WebSocket protocol. Decision support is based on a Weka classifier. Results: The proposed framework is ready to be used for the annotation and basic evaluation of physical activity data in a Wi-Fi covered areas. Minor issues are related to the occasional instability of data transmission, which has to be handled consequently. Conclusions: We found the overall framework architecture robust enough to serve its purpose. Next steps in the development will lead to an expansion of outlined functionality.
Morfometrie založená na voxelech je plně automatická objektivní metoda zpracování stavetulárních dat z magnetické rezonance. V článku jsou shrnuty principy a hlavní výhody a nevýhody této metody oproti klasické volumetrii založené na segmentaci oblastí zájmu. Navzdory vzrůstající popularitě morfometrie založené na voxelech existuje jen několik prací vyhodnocujících kvalitu a limitaci této metody. Ve druhé části článku podáváme synopsi studie stanovující test retest reliabilitu morfometrie založené na voxelech.
Voxel based morphometry (VBM) is a fully automated objective method for processing MRI images. The paper outlines the main principles, advantages and disadvantages of this technique over classical morphometric analyses based on semimanual measurement of regions of interest. Despite the increasing popularity of VBM, there are only few papers looking at limitations and quality of this technique. In the second part of the paper we present the synopsis of our study looking at test retest reliability of VBM.