transXpress: a Snakemake pipeline for streamlined de novo transcriptome assembly and annotation

. 2023 Apr 04 ; 24 (1) : 133. [epub] 20230404

Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid37016291

Grantová podpora
9331 Gordon and Betty Moore Foundation
2020-221485 Chan Zuckerberg Foundation
F32-ES032276 NIEHS NIH HHS - United States
21-11563M Grantová Agentura České Republiky
891397 H2020 Marie Skłodowska-Curie Actions
MCB-1818132 National Science Foundation
F32 ES032276 NIEHS NIH HHS - United States

Odkazy

PubMed 37016291
PubMed Central PMC10074830
DOI 10.1186/s12859-023-05254-8
PII: 10.1186/s12859-023-05254-8
Knihovny.cz E-zdroje

BACKGROUND: RNA-seq followed by de novo transcriptome assembly has been a transformative technique in biological research of non-model organisms, but the computational processing of RNA-seq data entails many different software tools. The complexity of these de novo transcriptomics workflows therefore presents a major barrier for researchers to adopt best-practice methods and up-to-date versions of software. RESULTS: Here we present a streamlined and universal de novo transcriptome assembly and annotation pipeline, transXpress, implemented in Snakemake. transXpress supports two popular assembly programs, Trinity and rnaSPAdes, and allows parallel execution on heterogeneous cluster computing hardware. CONCLUSIONS: transXpress simplifies the use of best-practice methods and up-to-date software for de novo transcriptome assembly, and produces standardized output files that can be mined using SequenceServer to facilitate rapid discovery of new genes and proteins in non-model organisms.

Zobrazit více v PubMed

Torrens-Spence MP, Fallon TR, Weng JK. A workflow for studying specialized metabolism in nonmodel eukaryotic organisms. In: O’Connor SE, editor. Methods in enzymology. Academic Press; 2016. pp. 69–97. PubMed

Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet. 2019;20:631–656. doi: 10.1038/s41576-019-0150-2. PubMed DOI

RNA-Seq datasets in NCBI SRA. https://www.ncbi.nlm.nih.gov/sra/?term=TRANSCRIPTOMIC%5BSource%5D. Accessed 24 Oct 2022.

NCBI TSA. https://www.ncbi.nlm.nih.gov/Traces/wgs/?view=TSA. Accessed 24 Oct 2022.

Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13. doi: 10.1186/s13059-016-0881-8. PubMed DOI PMC

Martin J, Bruno VM, Fang Z, Meng X, Blow M, Zhang T, et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genom. 2010;11:663. doi: 10.1186/1471-2164-11-663. PubMed DOI PMC

Melicher D, Torson AS, Dworkin I, Bowsher JH. A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach. BMC Genom. 2014;15:188. doi: 10.1186/1471-2164-15-188. PubMed DOI PMC

Ortiz R, Gera P, Rivera C, Santos JC. Pincho: a modular approach to high quality de novo transcriptomics. Genes. 2021;12:953. doi: 10.3390/genes12070953. PubMed DOI PMC

Lataretu M, Hölzer M. RNAflow: an effective and simple RNA-Seq differential gene expression pipeline using nextflow. Genes. 2020;11:1487. doi: 10.3390/genes11121487. PubMed DOI PMC

Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020;38:276–278. doi: 10.1038/s41587-020-0439-x. PubMed DOI

Federico A, Karagiannis T, Karri K, Kishore D, Koga Y, Campbell JD, et al. Pipeliner: a nextflow-based framework for the definition of sequencing data processing pipelines. Front Genet. 2019;10:614. doi: 10.3389/fgene.2019.00614. PubMed DOI PMC

Cornwell M, Vangala M, Taing L, Herbert Z, Köster J, Li B, et al. VIPER: visualization pipeline for RNA-seq, a snakemake workflow for efficient and complete RNA-seq analysis. BMC Bioinform. 2018;19:135. doi: 10.1186/s12859-018-2139-9. PubMed DOI PMC

Zhang X, Jonassen I. RASflow: an RNA-Seq analysis workflow with snakemake. BMC Bioinform. 2020;21:110. doi: 10.1186/s12859-020-3433-x. PubMed DOI PMC

Wang D. hppRNA—a snakemake-based handy parameter-free pipeline for RNA-Seq analysis of numerous samples. Brief Bioinform. 2018;19:622–626. PubMed

Wolfien M, Rimmbach C, Schmitz U, Jung JJ, Krebs S, Steinhoff G, et al. TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation. BMC Bioinform. 2016;17:21. doi: 10.1186/s12859-015-0873-9. PubMed DOI PMC

Zhao S, Xi L, Quan J, Xi H, Zhang Y, von Schack D, et al. QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization. BMC Genom. 2016;17:39. doi: 10.1186/s12864-015-2356-9. PubMed DOI PMC

Orjuela S, Huang R, Hembach KM, Robinson MD, Soneson C. ARMOR: an automated reproducible modular workflow for preprocessing and differential analysis of RNA-seq data. G3. 2019;9:2089–2096. doi: 10.1534/g3.119.400185. PubMed DOI PMC

Gadepalli VS, Ozer HG, Yilmaz AS, Pietrzak M, Webb A. BISR-RNAseq: an efficient and scalable RNAseq analysis workflow with interactive report generation. BMC Bioinform. 2019;20(Suppl 24):670. doi: 10.1186/s12859-019-3251-1. PubMed DOI PMC

Law CW, Alhamdoosh M, Su S, Dong X, Tian L, Smyth GK, et al. RNA-seq analysis is easy as 1–2–3 with limma, Glimma and edgeR. F1000Res. 2016;5. PubMed PMC

Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520–2522. doi: 10.1093/bioinformatics/bts480. PubMed DOI

Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316–319. doi: 10.1038/nbt.3820. PubMed DOI

Goecks J, Nekrutenko A, Taylor J, Galaxy Team Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11:R86. doi: 10.1186/gb-2010-11-8-r86. PubMed DOI PMC

Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018;15:475–476. doi: 10.1038/s41592-018-0046-7. PubMed DOI PMC

transXpress GitHub page. https://github.com/transXpress/transXpress. Accessed 30 Nov 2022.

Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. PubMed DOI PMC

Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. PubMed DOI PMC

Smith-Unna R, Boursnell C, Patro R, Hibberd JM, Kelly S. TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res. 2016;26:1134–1144. doi: 10.1101/gr.196469.115. PubMed DOI PMC

Babraham bioinformatics—FastQC A quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc. Accessed 11 Oct 2021.

Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047–3048. doi: 10.1093/bioinformatics/btw354. PubMed DOI PMC

Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. PubMed DOI PMC

Geniza M, Jaiswal P. Tools for building de novo transcriptome assembly. Curr Plant Biol. 2017;11–12:41–45. doi: 10.1016/j.cpb.2017.12.004. DOI

Bushmanova E, Antipov D, Lapidus A, Prjibelski AD. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience. 2019;8:100. doi: 10.1093/gigascience/giz100. PubMed DOI PMC

Hölzer M, Marz M. De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience. 2019;8:039. doi: 10.1093/gigascience/giz039. PubMed DOI PMC

Ren X, Liu T, Dong J, Sun L, Yang J, Zhu Y, et al. Evaluating de Bruijn graph assemblers on 454 transcriptomic data. PLoS ONE. 2012;7:e51188. doi: 10.1371/journal.pone.0051188. PubMed DOI PMC

Trinity Wiki—assembly statistics. https://github.com/trinityrnaseq/trinityrnaseq/wiki/Transcriptome-Contig-Nx-and-ExN50-stats. Accessed 24 Oct 2022.

Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38:4647–4654. doi: 10.1093/molbev/msab199. PubMed DOI PMC

Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525–527. doi: 10.1038/nbt.3519. PubMed DOI

Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. PubMed DOI PMC

Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. PubMed DOI PMC

Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. PubMed DOI PMC

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinform. 2009;10:421. doi: 10.1186/1471-2105-10-421. PubMed DOI PMC

UniProt Consortium UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. PubMed DOI PMC

Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. PubMed DOI

Sonnhammer EL, Eddy SR, Durbin R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 1997;28:405–420. doi: 10.1002/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L. PubMed DOI

Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. PubMed DOI PMC

Kalvari I, Nawrocki EP, Ontiveros-Palacios N, Argasinska J, Lamkiewicz K, Marz M, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 2021;49:D192–200. doi: 10.1093/nar/gkaa1047. PubMed DOI PMC

Teufel F, Almagro Armenteros JJ, Johansen AR, Gíslason MH, Pihl SI, Tsirigos KD, et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol. 2022;40:1023–1025. doi: 10.1038/s41587-021-01156-3. PubMed DOI PMC

Almagro Armenteros JJ, Salvatore M, Emanuelsson O, Winther O, von Heijne G, Elofsson A, et al. Detecting sequence signals in targeting peptides using deep learning. Life Sci Alliance. 2019;2:5. doi: 10.26508/lsa.201900429. PubMed DOI PMC

Sonnhammer EL, von Heijne G, Krogh A. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol. 1998;6:175–182. PubMed

Priyam A, Woodcroft BJ, Rai V, Moghul I, Mungala A, Ter F, et al. Sequenceserver: a modern graphical user interface for custom BLAST databases. Mol Biol Evol. 2019 doi: 10.1093/molbev/msz185. PubMed DOI PMC

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. PubMed DOI

Dantu PK, Prasad M, Ranjan R. Elucidating biosynthetic pathway of piperine using comparative transcriptome analysis of leaves, root and spike in Piper longum L. bioRxiv. 2021; 2021.01.03.425108.

Salehi B, Zakaria ZA, Gyawali R, Ibrahim SA, Rajkovic J, Shinwari ZK, et al. Piper species: a comprehensive review on their phytochemistry. Biol Act Appl Mol. 2019;24:1364. PubMed PMC

Choudhary N, Singh V. A census of P. longum’s phytochemicals and their network pharmacological evaluation for identifying novel drug-like molecules against various diseases, with a special focus on neurological disorders. PLoS ONE. 2018;13:e0191006. doi: 10.1371/journal.pone.0191006. PubMed DOI PMC

Hu L, Xu Z, Wang M, Fan R, Yuan D, Wu B, et al. The chromosome-scale reference genome of black pepper provides insight into piperine biosynthesis. Nat Commun. 2019;10:1–11. doi: 10.1038/s41467-019-12607-6. PubMed DOI PMC

Čalounová T. Piper longum transcriptomes generated using transXpress. 10.5281/zenodo.7380017. 2022.

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...