sequence annotation Dotaz Zobrazit nápovědu
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
Secondary structure elements (SSEs) are inherent parts of protein structures, and their arrangement is characteristic for each protein family. Therefore, annotation of SSEs can facilitate orientation in the vast number of homologous structures which is now available for many protein families. It also provides a way to identify and annotate the key regions, like active sites and channels, and subsequently answer the key research questions, such as understanding of molecular function and its variability.This chapter introduces the concept of SSE annotation and describes the workflow for obtaining SSE annotation for the members of a selected protein family using program SecStrAnnotator.
Accurate annotation of genomic variants in human diseases is essential to allow personalized medicine. Assessment of somatic and germline TP53 alterations has now reached the clinic and is required in several circumstances such as the identification of the most effective cancer therapy for patients with chronic lymphocytic leukemia (CLL). Here, we present Seshat, a Web service for annotating TP53 information derived from sequencing data. A flexible framework allows the use of standard file formats such as Mutation Annotation Format (MAF) or Variant Call Format (VCF), as well as common TXT files. Seshat performs accurate variant annotations using the Human Genome Variation Society (HGVS) nomenclature and the stable TP53 genomic reference provided by the Locus Reference Genomic (LRG). In addition, using the 2017 release of the UMD_TP53 database, Seshat provides multiple statistical information for each TP53 variant including database frequency, functional activity, or pathogenicity. The information is delivered in standardized output tables that minimize errors and facilitate comparison of mutational data across studies. Seshat is a beneficial tool to interpret the ever-growing TP53 sequencing data generated by multiple sequencing platforms and it is freely available via the TP53 Website, http://p53.fr or directly at http://vps338341.ovh.net/.
- MeSH
- anotace sekvence MeSH
- databáze genetické * MeSH
- genetická variace genetika MeSH
- genomika trendy MeSH
- internet MeSH
- lidé MeSH
- mutace MeSH
- nádorový supresorový protein p53 genetika MeSH
- software * MeSH
- výpočetní biologie trendy MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Bacillus thuringiensis (Bt) is efficient, strongly specific, and avirulent to humans, making it one of the most popular biopesticides in the world. Bt LLP29 is a mosquitocidal strain that was first isolated from Magnolia denudata. To understand its molecular mechanism against mosquitoes, the genome of Bt LLP29 was sequenced and annotated in this study. The LLP29 genome was found to have a total length of 5.99 Mb, with an average G + C content of 35.21%. A total of 6107 coding sequences were also detected, together with 42 rRNAs and 124 tRNAs and 135 other RNAs. With the help of annotation databases, including GO, COG, KEGG, Nr and Swiss-Prot, most unigene functions were identified. At the same time, a collinear analysis was performed on the genome of LLP29. There were also some virulence genes detected, including cry, chitinase, zwittermicin and vip.
BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
Soda lakes, with their high salinity and high pH, pose a very challenging environment for life. Microorganisms living in these harsh conditions have had to adapt their physiology and gene inventory. Therefore, we analyzed the complete genome of the haloalkaliphilic photoheterotrophic bacterium Rhodobaca barguzinensis strain alga05. It consists of a 3,899,419 bp circular chromosome with 3624 predicted coding sequences. In contrast to most of Rhodobacterales, this strain lacks any extrachromosomal elements. To identify the genes responsible for adaptation to high pH, we compared the gene inventory in the alga05 genome with genomes of 17 reference strains belonging to order Rhodobacterales. We found that all haloalkaliphilic strains contain the mrpB gene coding for the B subunit of the MRP Na+/H+ antiporter, while this gene is absent in all non-alkaliphilic strains, which indicates its importance for adaptation to high pH. Further analysis showed that alga05 requires organic carbon sources for growth, but it also contains genes encoding the ethylmalonyl-CoA pathway for CO2 fixation. Remarkable is the genetic potential to utilize organophosphorus compounds as a source of phosphorus. In summary, its genetic inventory indicates a large flexibility of the alga05 metabolism, which is advantageous in rapidly changing environmental conditions in soda lakes.
Satellite DNA, a class of repetitive sequences forming long arrays of tandemly repeated units, represents substantial portions of many plant genomes yet remains poorly characterized due to various methodological obstacles. Here we show that the genome of the field bean (Vicia faba, 2n = 12), a long-established model for cytogenetic studies in plants, contains a diverse set of satellite repeats, most of which remained concealed until their present investigation. Using next-generation sequencing combined with novel bioinformatics tools, we reconstructed consensus sequences of 23 novel satellite repeats representing 0.008-2.700% of the genome and mapped their distribution on chromosomes. We found that in addition to typical satellites with monomers hundreds of nucleotides long, V. faba contains a large number of satellite repeats with unusually long monomers (687-2033 bp), which are predominantly localized in pericentromeric regions. Using chromatin immunoprecipitation with CenH3 antibody, we revealed an extraordinary diversity of centromeric satellites, consisting of seven repeats with chromosome-specific distribution. We also found that in spite of their different nucleotide sequences, all centromeric repeats are replicated during mid-S phase, while most other satellites are replicated in the first part of late S phase, followed by a single family of FokI repeats representing the latest replicating chromatin.
- MeSH
- anotace sekvence MeSH
- centromera metabolismus MeSH
- chromatinová imunoprecipitace MeSH
- DNA rostlinná genetika metabolismus MeSH
- genom rostlinný genetika MeSH
- mapování chromozomů metody MeSH
- molekulární evoluce MeSH
- načasování replikace DNA genetika MeSH
- satelitní DNA genetika MeSH
- sekvenční analýza DNA MeSH
- Vicia faba genetika metabolismus MeSH
- výpočetní biologie MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The yeast Magnusiomyces capitatus is an opportunistic human pathogen causing rare yet severe infections, especially in patients with hematological malignancies. Here, we report the 20.2 megabase genome sequence of an environmental strain of this species as well as the genome sequences of eight additional isolates from human and animal sources providing an insight into intraspecies variation. The distribution of single-nucleotide variants is indicative of genetic recombination events, supporting evidence for sexual reproduction in this heterothallic yeast. Using RNAseq-aided annotation, we identified genes for 6518 proteins including several expanded families such as kexin proteases and Hsp70 molecular chaperones. Several of these families are potentially associated with the ability of M. capitatus to infect and colonize humans. For the purpose of comparative analysis, we also determined the genome sequence of a closely related yeast, Magnusiomyces ingens. The genome sequences of M. capitatus and M. ingens exhibit many distinct features and represent a basis for further comparative and functional studies.
- MeSH
- anotace sekvence MeSH
- antifungální látky farmakologie MeSH
- faktory virulence MeSH
- fenotyp MeSH
- fylogeneze MeSH
- genom fungální * MeSH
- genomika * metody MeSH
- lidé MeSH
- mikrobiální testy citlivosti MeSH
- multigenová rodina MeSH
- mykózy mikrobiologie MeSH
- oportunní infekce mikrobiologie MeSH
- rekombinace genetická MeSH
- Saccharomycetales klasifikace genetika růst a vývoj patogenita MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation sequencing reads for characterization of repetitive DNA in eukaryotes. The clustering algorithm facilitates repeat identification in any genome by using relatively small quantities of short sequence reads, and additional tools within the pipeline perform automatic annotation and quantification of the identified repeats. The pipeline is integrated into the Galaxy platform, which provides a user-friendly web interface for script execution and documentation of the results. Compared to the original version of the pipeline, RepeatExplorer2 provides automated annotation of transposable elements, identification of tandem repeats and enhanced visualization of analysis results. Here, we present an overview of the RepeatExplorer2 workflow and provide procedures for its application to (i) de novo repeat identification in a single species, (ii) comparative repeat analysis in a set of species, (iii) development of satellite DNA probes for cytogenetic experiments and (iv) identification of centromeric repeats based on ChIP-seq data. Each procedure takes approximately 2 d to complete. RepeatExplorer2 is available at https://repeatexplorer-elixir.cerit-sc.cz .
- MeSH
- DNA sondy chemie genetika MeSH
- DNA chemie genetika MeSH
- genomika metody MeSH
- lidé MeSH
- repetitivní sekvence nukleových kyselin MeSH
- sekvenční analýza DNA metody MeSH
- shluková analýza MeSH
- software MeSH
- transpozibilní elementy DNA MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Summary: MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available experimental or predicted protein structures. Provided a UniProt ID, MolArt downloads and displays sequence annotations, sequence-structure mapping and relevant structures. The sequence and structure views are interlinked, enabling sequence annotations being color overlaid over the mapped structures, thus providing an enhanced understanding and interpretation of the available molecular data. Availability and implementation: MolArt is released under the Apache 2 license and is available at https://github.com/davidhoksza/MolArt. The project web page https://davidhoksza.github.io/MolArt/ features examples and applications of the tool.