Gene annotation
Dotaz
Zobrazit nápovědu
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
Secondary structure elements (SSEs) are inherent parts of protein structures, and their arrangement is characteristic for each protein family. Therefore, annotation of SSEs can facilitate orientation in the vast number of homologous structures which is now available for many protein families. It also provides a way to identify and annotate the key regions, like active sites and channels, and subsequently answer the key research questions, such as understanding of molecular function and its variability.This chapter introduces the concept of SSE annotation and describes the workflow for obtaining SSE annotation for the members of a selected protein family using program SecStrAnnotator.
Recent research has already shown that circular RNAs (circRNAs) are functional in gene expression regulation and potentially related to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. However, the function of most circRNAs remains unknown, and it is expensive and time-consuming to discover it through biological experiments. In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and subsequent miRNA-mRNA interactions. First, we construct an interaction network for a target circRNA and secondly spread the information from the network nodes with the known function to the root circRNA node. This idea itself is not new; our main contribution lies in proposing an efficient and exact deterministic procedure based on the principle of probability-generating functions to calculate the p-value of association test between a circRNA and an annotation term. We show that our publicly available algorithm is both more effective and efficient than the commonly used Monte-Carlo sampling approach that may suffer from difficult quantification of sampling convergence and subsequent sampling inefficiency. We experimentally demonstrate that the new approach is two orders of magnitude faster than the Monte-Carlo sampling, which makes summary annotation of large circRNA files feasible; this includes their reannotation after periodical interaction network updates, for example. We provide a summary annotation of a current circRNA database as one of our outputs. The proposed algorithm could be generalized towards other types of RNA in way that is straightforward.
Accurate annotation of genomic variants in human diseases is essential to allow personalized medicine. Assessment of somatic and germline TP53 alterations has now reached the clinic and is required in several circumstances such as the identification of the most effective cancer therapy for patients with chronic lymphocytic leukemia (CLL). Here, we present Seshat, a Web service for annotating TP53 information derived from sequencing data. A flexible framework allows the use of standard file formats such as Mutation Annotation Format (MAF) or Variant Call Format (VCF), as well as common TXT files. Seshat performs accurate variant annotations using the Human Genome Variation Society (HGVS) nomenclature and the stable TP53 genomic reference provided by the Locus Reference Genomic (LRG). In addition, using the 2017 release of the UMD_TP53 database, Seshat provides multiple statistical information for each TP53 variant including database frequency, functional activity, or pathogenicity. The information is delivered in standardized output tables that minimize errors and facilitate comparison of mutational data across studies. Seshat is a beneficial tool to interpret the ever-growing TP53 sequencing data generated by multiple sequencing platforms and it is freely available via the TP53 Website, http://p53.fr or directly at http://vps338341.ovh.net/.
- MeSH
- anotace sekvence MeSH
- databáze genetické * MeSH
- genetická variace genetika MeSH
- genomika trendy MeSH
- internet MeSH
- lidé MeSH
- mutace MeSH
- nádorový supresorový protein p53 genetika MeSH
- software * MeSH
- výpočetní biologie trendy MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Banana Fusarium wilt caused by Fusarium oxysporum f. sp. cubense (Foc) is one of the most destructive soil-borne diseases. In this study, young tissue-cultured plantlets of banana (Musa spp. AAA) cultivars differing in Foc susceptibility were used to reveal their differential responses to this pathogen using digital gene expression (DGE). Data were evaluated by various bioinformatic tools (Venn diagrams, gene ontology (GO) annotation and Kyoto encyclopedia of genes and genomes (KEGG) pathway analyses) and immunofluorescence labelling method to support the identification of gene candidates determining the resistance of banana against Foc. Interestingly, we have identified MaWRKY50 as an important gene involved in both constitutive and induced resistance. We also identified new genes involved in the resistance of banana to Foc, including several other transcription factors (TFs), pathogenesis-related (PR) genes and some genes related to the plant cell wall biosynthesis or degradation (e.g., pectinesterases, β-glucosidases, xyloglucan endotransglucosylase/hydrolase and endoglucanase). The resistant banana cultivar shows activation of PR-3 and PR-4 genes as well as formation of different constitutive cell barriers to restrict spreading of the pathogen. These data suggest new mechanisms of banana resistance to Foc.
- MeSH
- anotace sekvence MeSH
- banánovník genetika mikrobiologie MeSH
- fluorescenční protilátková technika MeSH
- Fusarium * MeSH
- genová ontologie MeSH
- kořeny rostlin genetika MeSH
- náchylnost k nemoci MeSH
- nemoci rostlin genetika mikrobiologie MeSH
- odolnost vůči nemocem MeSH
- polymerázová řetězová reakce MeSH
- regulace genové exprese u rostlin * MeSH
- stanovení celkové genové exprese MeSH
- transkriptom * MeSH
- výpočetní biologie metody MeSH
- Publikační typ
- časopisecké články MeSH
Summary: MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available experimental or predicted protein structures. Provided a UniProt ID, MolArt downloads and displays sequence annotations, sequence-structure mapping and relevant structures. The sequence and structure views are interlinked, enabling sequence annotations being color overlaid over the mapped structures, thus providing an enhanced understanding and interpretation of the available molecular data. Availability and implementation: MolArt is released under the Apache 2 license and is available at https://github.com/davidhoksza/MolArt. The project web page https://davidhoksza.github.io/MolArt/ features examples and applications of the tool.
Specialized or secondary metabolites are small molecules of biological origin, often showing potent biological activities with applications in agriculture, engineering and medicine. Usually, the biosynthesis of these natural products is governed by sets of co-regulated and physically clustered genes known as biosynthetic gene clusters (BGCs). To share information about BGCs in a standardized and machine-readable way, the Minimum Information about a Biosynthetic Gene cluster (MIBiG) data standard and repository was initiated in 2015. Since its conception, MIBiG has been regularly updated to expand data coverage and remain up to date with innovations in natural product research. Here, we describe MIBiG version 4.0, an extensive update to the data repository and the underlying data standard. In a massive community annotation effort, 267 contributors performed 8304 edits, creating 557 new entries and modifying 590 existing entries, resulting in a new total of 3059 curated entries in MIBiG. Particular attention was paid to ensuring high data quality, with automated data validation using a newly developed custom submission portal prototype, paired with a novel peer-reviewing model. MIBiG 4.0 also takes steps towards a rolling release model and a broader involvement of the scientific community. MIBiG 4.0 is accessible online at https://mibig.secondarymetabolites.org/.
Acquisition of genes by plastid genomes (plastomes) via horizontal gene transfer (HGT) seems to be a rare phenomenon. Here, we report an interesting case of HGT revealed by sequencing the plastomes of the eustigmatophyte algae Monodopsis sp. MarTras21 and Vischeria sp. CAUP Q 202. These plastomes proved to harbour a unique cluster of six genes, most probably acquired from a bacterium of the phylum Bacteroidetes, with homologues in various bacteria, typically organized in a conserved uncharacterized putative operon. Sequence analyses of the six proteins encoded by the operon yielded the following annotation for them: (i) a novel family without discernible homologues; (ii) a new family within the superfamily of metallo-dependent hydrolases; (iii) a novel subgroup of the UbiA superfamily of prenyl transferases; (iv) a new clade within the sugar phosphate cyclase superfamily; (v) a new family within the xylose isomerase-like superfamily; and (vi) a hydrolase for a phosphate moiety-containing substrate. We suggest that the operon encodes enzymes of a pathway synthesizing an isoprenoid-cyclitol-derived compound, possibly an antimicrobial or other protective substance. To the best of our knowledge, this is the first report of an expansion of the metabolic capacity of a plastid mediated by HGT into the plastid genome.
- MeSH
- anotace sekvence MeSH
- bakteriální geny MeSH
- DNA řas genetika MeSH
- genom plastidový * MeSH
- Heterokontophyta genetika MeSH
- molekulární evoluce MeSH
- multigenová rodina MeSH
- přenos genů horizontální * MeSH
- sekvenční analýza DNA metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
BACKGROUND: Immune-response (IR) genes have an important role in the defense against highly variable pathogens, and therefore, diversity in these genomic regions is essential for species' survival and adaptation. Although current genome assemblies from Old World camelids are very useful for investigating genome-wide diversity, demography and population structure, they have inconsistencies and gaps that limit analyses at local genomic scales. Improved and more accurate genome assemblies and annotations are needed to study complex genomic regions like adaptive and innate IR genes. RESULTS: In this work, we improved the genome assemblies of the three Old World camel species - domestic dromedary and Bactrian camel, and the two-humped wild camel - via different computational methods. The newly annotated dromedary genome assembly CamDro3 served as reference to scaffold the NCBI RefSeq genomes of domestic Bactrian and wild camels. These upgraded assemblies were then used to assess nucleotide diversity of IR genes within and between species, and to compare the diversity found in immune genes and the rest of the genes in the genome. We detected differences in the nucleotide diversity among the three Old World camelid species and between IR gene groups, i.e., innate versus adaptive. Among the three species, domestic Bactrian camels showed the highest mean nucleotide diversity. Among the functionally different IR gene groups, the highest mean nucleotide diversity was observed in the major histocompatibility complex. CONCLUSIONS: The new camel genome assemblies were greatly improved in terms of contiguity and increased size with fewer scaffolds, which is of general value for the scientific community. This allowed us to perform in-depth studies on genetic diversity in immunity-related regions of the genome. Our results suggest that differences of diversity across classes of genes appear compatible with a combined role of population history and differential exposures to pathogens, and consequent different selective pressures.
Genome sequencing of the human parasite Schistosoma mansoni revealed an interesting gene superfamily, called micro-exon gene (meg), that encodes secreted MEG proteins. The genes are composed of short exons (3-81 base pairs) regularly interspersed with long introns (up to 5 kbp). This article recollects 35 S. mansoni specific meg genes that are distributed over 7 autosomes and one pair of sex chromosomes and that code for at least 87 verified MEG proteins. We used various bioinformatics tools to produce an optimal alignment and propose a phylogenetic analysis. This work highlighted intriguing conserved patterns/motifs in the sequences of the highly variable MEG proteins. Based on the analyses, we were able to classify the verified MEG proteins into two subfamilies and to hypothesize their duplication and colonization of all the chromosomes. Together with motif identification, we also proposed to revisit MEGs' common names and annotation in order to avoid duplication, to help the reproducibility of research results and to avoid possible misunderstandings.
- MeSH
- exony genetika MeSH
- fylogeneze MeSH
- lidé MeSH
- mapování chromozomů MeSH
- reprodukovatelnost výsledků MeSH
- Schistosoma mansoni * genetika MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH