BACKGROUND: RNA-seq followed by de novo transcriptome assembly has been a transformative technique in biological research of non-model organisms, but the computational processing of RNA-seq data entails many different software tools. The complexity of these de novo transcriptomics workflows therefore presents a major barrier for researchers to adopt best-practice methods and up-to-date versions of software. RESULTS: Here we present a streamlined and universal de novo transcriptome assembly and annotation pipeline, transXpress, implemented in Snakemake. transXpress supports two popular assembly programs, Trinity and rnaSPAdes, and allows parallel execution on heterogeneous cluster computing hardware. CONCLUSIONS: transXpress simplifies the use of best-practice methods and up-to-date software for de novo transcriptome assembly, and produces standardized output files that can be mined using SequenceServer to facilitate rapid discovery of new genes and proteins in non-model organisms.
- Keywords
- De novo transcriptome assembly, Differential expression analysis, High-performance computing, Non-model organisms, RNA-seq, Reproducible software, Transcriptome annotation,
- MeSH
- Molecular Sequence Annotation MeSH
- Sequence Analysis, RNA methods MeSH
- RNA-Seq MeSH
- Software * MeSH
- Gene Expression Profiling MeSH
- Transcriptome * MeSH
- Publication type
- Journal Article MeSH
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
- Keywords
- differentially expressed genes, functional annotation, transcriptome,
- MeSH
- Molecular Sequence Annotation methods MeSH
- Humans MeSH
- Sequence Analysis, RNA methods MeSH
- Software * MeSH
- Gene Expression Profiling methods MeSH
- Transcriptome * MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
BACKGROUND: Ectoparasites from the family Diplozoidae (Platyhelminthes, Monogenea) belong to obligate haematophagous helminths of cyprinid fish. Current knowledge of these worms is for the most part limited to their morphological, phylogenetic, and population features. Information concerning the biochemical and molecular nature of physiological processes involved in host-parasite interaction, such as evasion of the immune system and its regulation, digestion of macromolecules, suppression of blood coagulation and inflammation, and effect on host tissue and physiology, is lacking. In this study, we report for the first time a comprehensive transcriptomic/secretome description of expressed genes and proteins secreted by the adult stage of Eudiplozoon nipponicum (Goto, 1891) Khotenovsky, 1985, an obligate sanguivorous monogenean which parasitises the gills of the common carp (Cyprinus carpio). RESULTS: RNA-seq raw reads (324,941 Roche 454 and 149,697,864 Illumina) were generated, de novo assembled, and filtered into 37,062 protein-coding transcripts. For 19,644 (53.0%) of them, we determined their sequential homologues. In silico functional analysis of E. nipponicum RNA-seq data revealed numerous transcripts, pathways, and GO terms responsible for immunomodulation (inhibitors of proteolytic enzymes, CD59-like proteins, fatty acid binding proteins), feeding (proteolytic enzymes cathepsins B, D, L1, and L3), and development (fructose 1,6-bisphosphatase, ferritin, and annexin). LC-MS/MS spectrometry analysis identified 721 proteins secreted by E. nipponicum with predominantly immunomodulatory and anti-inflammatory functions (peptidyl-prolyl cis-trans isomerase, homolog to SmKK7, tetraspanin) and ability to digest host macromolecules (cathepsins B, D, L1). CONCLUSIONS: In this study, we integrated two high-throughput sequencing techniques, mass spectrometry analysis, and comprehensive bioinformatics approach in order to arrive at the first comprehensive description of monogenean transcriptome and secretome. Exploration of E. nipponicum transcriptome-related nucleotide sequences and translated and secreted proteins offer a better understanding of molecular biology and biochemistry of these, often neglected, organisms. It enabled us to report the essential physiological pathways and protein molecules involved in their interactions with the fish hosts.
- Keywords
- Annotation, Assembly, Eudiplozoon nipponicum, Mass spectrometry, Monogenea, NGS, Secretome, Transcriptome,
- MeSH
- Molecular Sequence Annotation MeSH
- Chromatography, Liquid MeSH
- Phylogeny MeSH
- Carps * genetics MeSH
- Gene Expression Profiling MeSH
- Tandem Mass Spectrometry MeSH
- Transcriptome MeSH
- Trematoda * genetics MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
The tarnished plant bug (TPB), Lygus lineolaris (Palisot de Beauvois) is a polyphagous, phytophagous insect that has emerged as a major pest of cotton, alfalfa, fruits, and vegetable crops in the eastern United States and Canada. Using its piercing-sucking mouthparts, TPB employs a "lacerate and flush" feeding strategy in which saliva injected into plant tissue degrades cell wall components and lyses cells whose contents are subsequently imbibed by the TPB. It is known that a major component of TPB saliva is the polygalacturonase enzymes that degrade the pectin in the cell walls. However, not much is known about the other components of the saliva of this important pest. In this study, we explored the salivary gland transcriptome of TPB using Illumina sequencing. After in silico conversion of RNA sequences into corresponding polypeptides, 25,767 putative proteins were discovered. Of these, 19,540 (78.83%) showed significant similarity to known proteins in the either the NCBI nr or Uniprot databases. Gene ontology (GO) terms were assigned to 7,512 proteins, and 791 proteins in the sialotranscriptome of TPB were found to collectively map to 107 Kyoto Encyclopedia of Genes and Genomes (KEGG) database pathways. A total of 3,653 Pfam domains were identified in 10,421 sialotranscriptome predicted proteins resulting in 12,814 Pfam annotations; some proteins had more than one Pfam domain. Functional annotation revealed a number of salivary gland proteins that potentially facilitate degradation of host plant tissues and mitigation of the host plant defense response. These transcripts/proteins and their potential roles in TPB establishment are described.
- MeSH
- Molecular Sequence Annotation MeSH
- Gene Ontology MeSH
- Heteroptera genetics growth & development metabolism MeSH
- Genes, Insect genetics MeSH
- Salivary Glands metabolism MeSH
- Gene Expression Profiling * MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
BACKGROUND: RNA-sequencing analysis is increasingly utilized to study gene expression in non-model organisms without sequenced genomes. Aethionema arabicum (Brassicaceae) exhibits seed dimorphism as a bet-hedging strategy - producing both a less dormant mucilaginous (M+) seed morph and a more dormant non-mucilaginous (NM) seed morph. Here, we compared de novo and reference-genome based transcriptome assemblies to investigate Ae. arabicum seed dimorphism and to evaluate the reference-free versus -dependent approach for identifying differentially expressed genes (DEGs). RESULTS: A de novo transcriptome assembly was generated using sequences from M+ and NM Ae. arabicum dry seed morphs. The transcripts of the de novo assembly contained 63.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCO) compared to 90.9% for the transcripts of the reference genome. DEG detection used the strict consensus of three methods (DESeq2, edgeR and NOISeq). Only 37% of 1533 differentially expressed de novo assembled transcripts paired with 1876 genome-derived DEGs. Gene Ontology (GO) terms distinguished the seed morphs: the terms translation and nucleosome assembly were overrepresented in DEGs higher in abundance in M+ dry seeds, whereas terms related to mRNA processing and transcription were overrepresented in DEGs higher in abundance in NM dry seeds. DEGs amongst these GO terms included ribosomal proteins and histones (higher in M+), RNA polymerase II subunits and related transcription and elongation factors (higher in NM). Expression of the inferred DEGs and other genes associated with seed maturation (e.g. those encoding late embryogenesis abundant proteins and transcription factors regulating seed development and maturation such as ABI3, FUS3, LEC1 and WRI1 homologs) were put in context with Arabidopsis thaliana seed maturation and indicated that M+ seeds may desiccate and mature faster than NM. The 1901 transcriptomic DEG set GO-terms had almost 90% overlap with the 2191 genome-derived DEG GO-terms. CONCLUSIONS: Whilst there was only modest overlap of DEGs identified in reference-free versus -dependent approaches, the resulting GO analysis was concordant in both approaches. The identified differences in dry seed transcriptomes suggest mechanisms underpinning previously identified contrasts between morphology and germination behaviour of M+ and NM seeds.
- Keywords
- Aethionema arabicum, Dimorphic seeds, RNA-seq, Reference and reference-free, Transcriptome,
- MeSH
- Molecular Sequence Annotation MeSH
- Brassicaceae genetics growth & development MeSH
- Genome, Plant MeSH
- Gene Ontology MeSH
- Germination MeSH
- Gene Expression Regulation, Plant * MeSH
- Plant Proteins genetics MeSH
- Seeds genetics growth & development MeSH
- Gene Expression Profiling MeSH
- Transcriptome * MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Plant Proteins MeSH
BACKGROUND: The hop plant (Humulus lupulus L.) is a valuable source of several secondary metabolites, such as flavonoids, bitter acids, and essential oils. These compounds are widely implicated in the beer brewing industry and are having potential biomedical applications. Several independent breeding programs around the world have been initiated to develop new cultivars with enriched lupulin and secondary metabolite contents but met with limited success due to several constraints. In the present work, a pioneering attempt has been made to overexpress master regulator binary transcription factor complex formed by HlWRKY1 and HlWDR1 using a plant expression vector to enhance the level of prenylflavonoid and bitter acid content in the hop. Subsequently, we performed transcriptional profiling using high-throughput RNA-Seq technology in leaves of resultant transformants and wild-type hop to gain in-depth information about the genome-wide functional changes induced by HlWRKY1 and HlWDR1 overexpression. RESULTS: The transgenic WW-lines exhibited an elevated expression of structural and regulatory genes involved in prenylflavonoid and bitter acid biosynthesis pathways. In addition, the comparative transcriptome analysis revealed a total of 522 transcripts involved in 30 pathways, including lipids and amino acids biosynthesis, primary carbon metabolism, phytohormone signaling and stress responses were differentially expressed in WW-transformants. It was apparent from the whole transcriptome sequencing that modulation of primary carbon metabolism and other pathways by HlWRKY1 and HlWDR1 overexpression resulted in enhanced substrate flux towards secondary metabolites pathway. The detailed analyses suggested that none of the pathways or genes, which have a detrimental effect on physiology, growth and development processes, were induced on a genome-wide scale in WW-transgenic lines. CONCLUSIONS: Taken together, our results suggest that HlWRKY1 and HlWDR1 simultaneous overexpression positively regulates the prenylflavonoid and bitter acid biosynthesis pathways in the hop and thus these transgenes are presented as prospective candidates for achieving enhanced secondary metabolite content in the hop.
- Keywords
- Bitter acids, Flavonoids, Genetic transformation, Humulus lupulus, Secondary metabolite, Transcription factors, Transcriptome analysis,
- MeSH
- Molecular Sequence Annotation MeSH
- Gene Expression MeSH
- Plants, Genetically Modified MeSH
- Genomics * MeSH
- Humulus genetics MeSH
- Plant Proteins genetics MeSH
- Gene Expression Profiling * MeSH
- Transcription Factors genetics MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Plant Proteins MeSH
- Transcription Factors MeSH
BACKGROUND: Photosynthetic euglenids are major contributors to fresh water ecosystems. Euglena gracilis in particular has noted metabolic flexibility, reflected by an ability to thrive in a range of harsh environments. E. gracilis has been a popular model organism and of considerable biotechnological interest, but the absence of a gene catalogue has hampered both basic research and translational efforts. RESULTS: We report a detailed transcriptome and partial genome for E. gracilis Z1. The nuclear genome is estimated to be around 500 Mb in size, and the transcriptome encodes over 36,000 proteins and the genome possesses less than 1% coding sequence. Annotation of coding sequences indicates a highly sophisticated endomembrane system, RNA processing mechanisms and nuclear genome contributions from several photosynthetic lineages. Multiple gene families, including likely signal transduction components, have been massively expanded. Alterations in protein abundance are controlled post-transcriptionally between light and dark conditions, surprisingly similar to trypanosomatids. CONCLUSIONS: Our data provide evidence that a range of photosynthetic eukaryotes contributed to the Euglena nuclear genome, evidence in support of the 'shopping bag' hypothesis for plastid acquisition. We also suggest that euglenids possess unique regulatory mechanisms for achieving extreme adaptability, through mechanisms of paralog expansion and gene acquisition.
- Keywords
- Cellular evolution, Euglena gracilis, Excavata, Gene architecture, Horizontal gene transfer, Plastid, Secondary endosymbiosis, Splicing, Transcriptome,
- MeSH
- Cell Nucleus MeSH
- Euglena gracilis genetics metabolism MeSH
- Genome * MeSH
- Plastids MeSH
- Proteome * MeSH
- Transcriptome * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Proteome * MeSH
Recent research has already shown that circular RNAs (circRNAs) are functional in gene expression regulation and potentially related to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. However, the function of most circRNAs remains unknown, and it is expensive and time-consuming to discover it through biological experiments. In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and subsequent miRNA-mRNA interactions. First, we construct an interaction network for a target circRNA and secondly spread the information from the network nodes with the known function to the root circRNA node. This idea itself is not new; our main contribution lies in proposing an efficient and exact deterministic procedure based on the principle of probability-generating functions to calculate the p-value of association test between a circRNA and an annotation term. We show that our publicly available algorithm is both more effective and efficient than the commonly used Monte-Carlo sampling approach that may suffer from difficult quantification of sampling convergence and subsequent sampling inefficiency. We experimentally demonstrate that the new approach is two orders of magnitude faster than the Monte-Carlo sampling, which makes summary annotation of large circRNA files feasible; this includes their reannotation after periodical interaction network updates, for example. We provide a summary annotation of a current circRNA database as one of our outputs. The proposed algorithm could be generalized towards other types of RNA in way that is straightforward.
- Keywords
- Annotation term, Circular RNA, Interaction network,
- MeSH
- Biomarkers MeSH
- Gene Regulatory Networks MeSH
- RNA, Circular * MeSH
- RNA, Messenger genetics metabolism MeSH
- MicroRNAs * genetics metabolism MeSH
- Probability MeSH
- Gene Expression Profiling methods MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Biomarkers MeSH
- RNA, Circular * MeSH
- RNA, Messenger MeSH
- MicroRNAs * MeSH
Maize (Zea mays L.) is one of the most important crops and a recognised model for biological research, with some individuals having supernumerary B chromosomes. The maize B chromosome has been studied for decades, yet its gene expression across different plant tissues has not been thoroughly described. Here, we present a comprehensive transcriptomic atlas of the maize plant with and without the B chromosome. By analysing eleven tissues/organs, we showed that B chromosome-encoded genes contribute to the transcriptome throughout the plant growth, with the highest activity observed in reproductive organs. Co-expression analysis revealed a cluster of 30 genes expressed specifically in tassel and indicated Shortage in chiasmata 1 promising candidate for regulation of crossover frequency mediated by the B chromosome. In addition to its own transcriptional activity, our data also demonstrated that the B chromosome influences the expression of A chromosome-located genes in all analysed tissues. Besides providing new insights into the expression and regulatory effects of the B chromosome, our work has also generated resources fundamental to exploring its wider biological role.
- Keywords
- B chromosome, gene annotation, gene expression, maize, sequence, transcriptome,
- Publication type
- Journal Article MeSH
Hazelnut (Corylus), which has high commercial and nutritional benefits, is an important tree for producing nuts and nut oil consumed as ingredient especially in chocolate. While Corylus avellana L. (Euro-pean hazelnut, Betulaceae) and Corylus colurna L. (Turkish hazelnut, Betulaceae) are the two common hazelnut species in Europe, C. avellana L. (Tombul hazelnut) is grown as the most widespread hazelnut species in Turkey, and C. colurna L., which is the most important genetic resource for hazelnut breeding, exists naturally in Anatolia. We generated the transcriptome data of these two Corylus species and used these data for gene discovery and gene expression profiling. Total RNA from young leaves, flowers (male and female), buds, and husk shoots of C. avellana and C. colurna were used for two different libraries and were sequenced using Illumina HiSeq4000 with 100 bp paired-end reads. The transcriptome data 10.48 and 10.30 Gb of C. avellana and C. colurna, respectively, were assembled into 70,265 and 88,343 unigenes, respectively. These unigenes were functionally annotated using the TRAPID platform. We identified 25,312 and 27,051 simple sequen-ce repeats (SSRs) for C. avellana and C. colurna, respectively. TL1, GMPM1, N, 2MMP, At1g29670, CHIB1 unigenes were selected for validation with qPCR. The first de novo transcriptome data of C. co-lurna were used to compare data of C. avellana of commercial importance. These data constitute a valuable extension of the publicly available transcriptomic resource aimed at breeding, medicinal, and industrial research studies.
- Keywords
- Corylus spp., RNA-seq, de novo, hazelnut, transcriptome,
- MeSH
- Corylus * genetics metabolism MeSH
- Nuts MeSH
- Gene Expression Profiling MeSH
- Publication type
- Journal Article MeSH
- Geographicals
- Turkey MeSH