BACKGROUND: RNA-seq followed by de novo transcriptome assembly has been a transformative technique in biological research of non-model organisms, but the computational processing of RNA-seq data entails many different software tools. The complexity of these de novo transcriptomics workflows therefore presents a major barrier for researchers to adopt best-practice methods and up-to-date versions of software. RESULTS: Here we present a streamlined and universal de novo transcriptome assembly and annotation pipeline, transXpress, implemented in Snakemake. transXpress supports two popular assembly programs, Trinity and rnaSPAdes, and allows parallel execution on heterogeneous cluster computing hardware. CONCLUSIONS: transXpress simplifies the use of best-practice methods and up-to-date software for de novo transcriptome assembly, and produces standardized output files that can be mined using SequenceServer to facilitate rapid discovery of new genes and proteins in non-model organisms.
- Keywords
- De novo transcriptome assembly, Differential expression analysis, High-performance computing, Non-model organisms, RNA-seq, Reproducible software, Transcriptome annotation,
- MeSH
- Molecular Sequence Annotation MeSH
- Sequence Analysis, RNA methods MeSH
- RNA-Seq MeSH
- Software * MeSH
- Gene Expression Profiling MeSH
- Transcriptome * MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Photosynthetic euglenids are major contributors to fresh water ecosystems. Euglena gracilis in particular has noted metabolic flexibility, reflected by an ability to thrive in a range of harsh environments. E. gracilis has been a popular model organism and of considerable biotechnological interest, but the absence of a gene catalogue has hampered both basic research and translational efforts. RESULTS: We report a detailed transcriptome and partial genome for E. gracilis Z1. The nuclear genome is estimated to be around 500 Mb in size, and the transcriptome encodes over 36,000 proteins and the genome possesses less than 1% coding sequence. Annotation of coding sequences indicates a highly sophisticated endomembrane system, RNA processing mechanisms and nuclear genome contributions from several photosynthetic lineages. Multiple gene families, including likely signal transduction components, have been massively expanded. Alterations in protein abundance are controlled post-transcriptionally between light and dark conditions, surprisingly similar to trypanosomatids. CONCLUSIONS: Our data provide evidence that a range of photosynthetic eukaryotes contributed to the Euglena nuclear genome, evidence in support of the 'shopping bag' hypothesis for plastid acquisition. We also suggest that euglenids possess unique regulatory mechanisms for achieving extreme adaptability, through mechanisms of paralog expansion and gene acquisition.
- Keywords
- Cellular evolution, Euglena gracilis, Excavata, Gene architecture, Horizontal gene transfer, Plastid, Secondary endosymbiosis, Splicing, Transcriptome,
- MeSH
- Cell Nucleus MeSH
- Euglena gracilis genetics metabolism MeSH
- Genome * MeSH
- Plastids MeSH
- Proteome * MeSH
- Transcriptome * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Proteome * MeSH
With the rise of next-generation sequencing methods, it has become increasingly possible to obtain genomewide sequence data even for nonmodel species. Such data are often used for the development of single nucleotide polymorphism (SNP) markers, which can subsequently be screened in a larger population sample using a variety of genotyping techniques. Many of these techniques require appropriate locus-specific PCR and genotyping primers. Currently, there is no publicly available software for the automated design of suitable PCR and genotyping primers from next-generation sequence data. Here we present a pipeline called Scrimer that automates multiple steps, including adaptor removal, read mapping, selection of SNPs and multiple primer design from transcriptome data. The designed primers can be used in conjunction with several widely used genotyping methods such as SNaPshot or MALDI-TOF genotyping. Scrimer is composed of several reusable modules and an interactive bash workflow that connects these modules. Even the basic steps are presented, so the workflow can be executed in a step-by-step manner. The use of standard formats throughout the pipeline allows data from various sources to be plugged in, as well as easy inspection of intermediate results with visualization tools of the user's choice.
- Keywords
- SNP genotyping, SNaPshot, next-generation sequencing, primer design, transcriptome,
- MeSH
- DNA Primers genetics MeSH
- Genotyping Techniques methods MeSH
- Polymerase Chain Reaction methods MeSH
- Sequence Analysis, DNA methods MeSH
- Transcriptome * MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA Primers MeSH
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
- Keywords
- differentially expressed genes, functional annotation, transcriptome,
- MeSH
- Molecular Sequence Annotation methods MeSH
- Humans MeSH
- Sequence Analysis, RNA methods MeSH
- Software * MeSH
- Gene Expression Profiling methods MeSH
- Transcriptome * MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The early evolution of eukaryotes and their adaptations to low-oxygen environments are fascinating open questions in biology. Genome-scale data from novel eukaryotes, and particularly from free-living lineages, are the key to answering these questions. The Parabasalia are a major group of anaerobic eukaryotes that form the most speciose lineage of Metamonada. The most well-studied are parasitic parabasalids, including Trichomonas vaginalis and Tritrichomonas foetus, but very little genome-scale data are available for free-living members of the group. Here, we sequenced the transcriptome of Pseudotrichomonas keilini, a free-living parabasalian. Comparative genomic analysis indicated that P. keilini possesses a metabolism and gene complement that are in many respects similar to its parasitic relative T. vaginalis and that in the time since their most recent common ancestor, it is the T. vaginalis lineage that has experienced more genomic change, likely due to the transition to a parasitic lifestyle. Features shared between P. keilini and T. vaginalis include a hydrogenosome (anaerobic mitochondrial homolog) that we predict to function much as in T. vaginalis and a complete glycolytic pathway that is likely to represent one of the primary means by which P. keilini obtains ATP. Phylogenomic analysis indicates that P. keilini branches within a clade of endobiotic parabasalids, consistent with the hypothesis that different parabasalid lineages evolved toward parasitic or free-living lifestyles from an endobiotic, anaerobic, or microaerophilic common ancestor.
Hazelnut (Corylus), which has high commercial and nutritional benefits, is an important tree for producing nuts and nut oil consumed as ingredient especially in chocolate. While Corylus avellana L. (Euro-pean hazelnut, Betulaceae) and Corylus colurna L. (Turkish hazelnut, Betulaceae) are the two common hazelnut species in Europe, C. avellana L. (Tombul hazelnut) is grown as the most widespread hazelnut species in Turkey, and C. colurna L., which is the most important genetic resource for hazelnut breeding, exists naturally in Anatolia. We generated the transcriptome data of these two Corylus species and used these data for gene discovery and gene expression profiling. Total RNA from young leaves, flowers (male and female), buds, and husk shoots of C. avellana and C. colurna were used for two different libraries and were sequenced using Illumina HiSeq4000 with 100 bp paired-end reads. The transcriptome data 10.48 and 10.30 Gb of C. avellana and C. colurna, respectively, were assembled into 70,265 and 88,343 unigenes, respectively. These unigenes were functionally annotated using the TRAPID platform. We identified 25,312 and 27,051 simple sequen-ce repeats (SSRs) for C. avellana and C. colurna, respectively. TL1, GMPM1, N, 2MMP, At1g29670, CHIB1 unigenes were selected for validation with qPCR. The first de novo transcriptome data of C. co-lurna were used to compare data of C. avellana of commercial importance. These data constitute a valuable extension of the publicly available transcriptomic resource aimed at breeding, medicinal, and industrial research studies.
- Keywords
- Corylus spp., RNA-seq, de novo, hazelnut, transcriptome,
- MeSH
- Corylus * genetics metabolism MeSH
- Nuts MeSH
- Gene Expression Profiling MeSH
- Publication type
- Journal Article MeSH
- Geographicals
- Turkey MeSH
Chromium (Cr) can interfere with plant gene expression, change the content of metabolites and affect plant growth. However, the molecular response mechanism of wetland plants at different time sequences under Cr stress has yet to be fully understood. In this study, Canna indica was exposed to 100 mg/kg Cr-contaminated soil for 0, 7, 14, and 21 days and analyzed using untargeted metabolomics (LC-MS) and transcriptomics. The results showed that Cr stress increased the activities of superoxide dismutase (SOD), ascorbate peroxidase (APX) and peroxidase (POD), the contents of glutathione (GSH), malondialdehyde (MDA), and oxygen free radical (ROS), and inhibited the biosynthesis of photosynthetic pigments, thus leading to changes in plant growth and biomass. Metabonomics analysis showed that Cr stress mainly affected 12 metabolic pathways, involving 38 differentially expressed metabolites, including amino acids, phenylpropane, and flavonoids. By transcriptome analysis, a total of 16,247 differentially expressed genes (DEGs, 7710 up-regulated genes, and 8537 down-regulated genes) were identified, among which, at the early stage of stress (Cr contaminate seven days), C. indica responds to Cr toxicity mainly through galactose, starch and sucrose metabolism. With the extension of stress time, plant hormone signal transduction and MAPK signaling pathway in C. indica in the Cr14 (Cr contaminate 14 days) treatment group were significantly affected. Finally, in the late stage of stress (Cr21), C. indica co-defuses Cr toxicity by activating its Glutathione metabolism and Phenylpropanoid biosynthesis. In conclusion, this study revealed the molecular response mechanism of C. indica to Cr stress at different times through multi-omics methods.
- Keywords
- Canna indica, Chromium, Metabolome, Physiology, Transcriptome,
- MeSH
- Chromium metabolism toxicity MeSH
- Stress, Physiological * genetics MeSH
- Soil Pollutants toxicity metabolism MeSH
- Metabolome MeSH
- Metabolomics * methods MeSH
- Gene Expression Regulation, Plant * MeSH
- Gene Expression Profiling * MeSH
- Transcriptome * MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Chromium MeSH
- Soil Pollutants MeSH
The hop plant (Humulus lupulus L.) produces several valuable secondary metabolites, such as prenylflavonoid, bitter acids, and essential oils. These compounds are biosynthesized in glandular trichomes (lupulin glands) endowed with pharmacological properties and widely implicated in the beer brewing industry. The present study is an attempt to generate exhaustive information of transcriptome dynamics and gene regulatory mechanisms involved in biosynthesis and regulation of these compounds, developmental changes including trichome development at three development stages, namely leaf, bract, and mature lupulin glands. Using high-throughput RNA-Seq technology, a total of 61.13, 50.01, and 20.18 Mb clean reads in the leaf, bract, and lupulin gland libraries, respectively, were obtained and assembled into 43,550 unigenes. The putative functions were assigned to 30,996 transcripts (71.17%) based on basic local alignment search tool similarity searches against public sequence databases, including GO, KEGG, NR, and COG families, which indicated that genes are principally involved in fundamental cellular and molecular functions, and biosynthesis of secondary metabolites. The expression levels of all unigenes were analyzed in leaf, bract, and lupulin glands tissues of hop. The expression profile of transcript encoding enzymes of BCAA metabolism, MEP, and shikimate pathway was most up-regulated in lupulin glands compared with leaves and bracts. Similarly, the expression levels of the transcription factors and structural genes that directly encode enzymes involved in xanthohumol, bitter acids, and terpenoids biosynthesis pathway were found to be significantly enhanced in lupulin glands, suggesting that production of these metabolites increases after the leaf development. In addition, numerous genes involved in primary metabolism, lipid metabolism, photosynthesis, generation of precursor metabolites/energy, protein modification, transporter activity, and cell wall component biogenesis were differentially regulated in three developmental stages, suggesting their involvement in the dynamics of the lupulin gland development. The identification of differentially regulated trichome-related genes provided a new foundation for molecular research on trichome development and differentiation in hop. In conclusion, the reported results provide directions for future functional genomics studies for genetic engineering or molecular breeding for augmentation of secondary metabolite content in hop.
- Keywords
- Humulus lupulus, RNA sequencing, bitter acids, lupulin glands, prenylflavonoids, terpenoids, trichome,
- MeSH
- Flavonoids biosynthesis chemistry metabolism MeSH
- Gene Ontology MeSH
- Humulus chemistry metabolism MeSH
- Plant Leaves genetics metabolism MeSH
- Propiophenones chemistry metabolism MeSH
- Gene Expression Regulation, Plant MeSH
- Plant Proteins genetics metabolism MeSH
- RNA-Seq MeSH
- Terpenes chemistry metabolism MeSH
- Transcription Factors metabolism MeSH
- Transcriptome genetics MeSH
- Trichomes genetics metabolism ultrastructure MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Flavonoids MeSH
- Propiophenones MeSH
- Plant Proteins MeSH
- Terpenes MeSH
- Transcription Factors MeSH
- xanthohumol MeSH Browser
As a representative of gymnosperms, the discovery of natural haploids of Ginkgo biloba L. has opened a new door for its research. Haploid germplasm has always been a research material of interest to researchers because of its special characteristics. However, we do not yet know the special features and mechanisms of haploid ginkgo following this significant discovery. In this study, we conducted a homogenous garden experiment on haploid and diploid ginkgo to explore the differences in growth, physiology and biochemistry between the two. Additionally, a high-depth transcriptome database of both was established to reveal their transcriptional differences. The results showed that haploid ginkgo exhibited weaker growth potential, lower photosynthesis and flavonoid accumulation capacity. Although the up-regulated expression of DEGs in haploid ginkgo reached 46.7% of the total DEGs in the whole transcriptome data, the gene sets of photosynthesis metabolic, glycolysis/gluconeogenesis and flavonoid biosynthesis pathways, which were significantly related to these differences, were found to show a significant down-regulated expression trend by gene set enrichment analysis (GSEA). We further found that the major metabolic pathways in the haploid ginkgo transcriptional database were down-regulated in expression compared to the diploid. This study reveals for the first time the phenotypic, growth and physiological differences in haploid ginkgos, and demonstrates their transcriptional patterns based on high-depth transcriptomic data, laying the foundation for subsequent in-depth studies of haploid ginkgos.
- Keywords
- gene dosage, ginkgo, haploid, mechanism, transcriptome,
- MeSH
- Flavonoids metabolism MeSH
- Gene Dosage MeSH
- Ginkgo biloba * genetics MeSH
- Haploidy MeSH
- Plant Leaves metabolism MeSH
- Transcriptome * MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Flavonoids MeSH
A pan-transcriptome describes the transcriptional and post-transcriptional consequences of genome diversity from multiple individuals within a species. We developed a barley pan-transcriptome using 20 inbred genotypes representing domesticated barley diversity by generating and analyzing short- and long-read RNA-sequencing datasets from multiple tissues. To overcome single reference bias in transcript quantification, we constructed genotype-specific reference transcript datasets (RTDs) and integrated these into a linear pan-genome framework to create a pan-RTD, allowing transcript categorization as core, shell or cloud. Focusing on the core (expressed in all genotypes), we observed significant transcript abundance variation among tissues and between genotypes driven partly by RNA processing, gene copy number, structural rearrangements and conservation of promotor motifs. Network analyses revealed conserved co-expression module::tissue correlations and frequent functional diversification. To complement the pan-transcriptome, we constructed a comprehensive cultivar (cv.) Morex gene-expression atlas and illustrate how these combined datasets can be used to guide biological inquiry.