Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis

. 2023 ; 3 (1) : vbad089. [epub] 20230706

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic-ecollection

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid37465398

MOTIVATION: While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the current best programming practices and requirements for reproducibility. RESULTS: We have developed scdrake, a fully automated workflow for secondary analysis of scRNA-seq data, which is fully implemented in the R language and built within the drake framework. The pipeline includes quality control, cell and gene filtering, normalization, detection of highly variable genes, dimensionality reduction, clustering, cell type annotation, detection of marker genes, differential expression analysis and integration of multiple samples. The pipeline is reproducible and scalable, has an efficient execution, provides easy extendability and access to intermediate results and outputs rich HTML reports. Scdrake is distributed as a Docker image, which provides a straightforward setup and enhances reproducibility. AVAILABILITY AND IMPLEMENTATION: The source code and documentation are available under the MIT license at https://github.com/bioinfocz/scdrake and https://bioinfocz.github.io/scdrake, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.

Zobrazit více v PubMed

Amezquita R.A. et al. (2020) Orchestrating single-cell analysis with bioconductor. Nat. Methods, 17, 137–145. PubMed PMC

Aran D. et al. (2019) Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol., 20, 163–172. PubMed PMC

Germain P. et al. (2022) Doublet identification in single-cell sequencing data using scDblFinder. F1000Research, 10, 979. PubMed PMC

Haghverdi L. et al. (2018) Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol., 36, 421–427. PubMed PMC

Hao Y. et al. (2021) Integrated analysis of multimodal single-cell data. Cell, 184, 3573–3587.e29. PubMed PMC

Huber W. et al. (2015) Orchestrating high-throughput genomic analysis with bioconductor. Nat. Methods, 12, 115–121. PubMed PMC

Islam S. et al. (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res., 21, 1160–1167. PubMed PMC

Kiselev V.Y. et al. (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods, 14, 483–486. PubMed PMC

Landau W.M. (2018) The drake R package: a pipeline toolkit for reproducibility and high-performance computing. J. Open Source Softw., 3, 550.

Luecken M.D., Theis F.J. (2019) Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol., 15, e8746. PubMed PMC

Lun A. et al. (2016) A step-by-step workflow for low-level analysis of single-cell RNA-seq data with bioconductor. F1000Research, 5, 2122. PubMed PMC

McCarthy D.J. et al. (2017) Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics, 33, 1179–1186. PubMed PMC

Mereu E. et al. (2020) Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat. Biotechnol., 38, 747–755. PubMed

Ushey K. (2022) renv: Project Environments. https://rstudio.github.io/renv/ (1 February 2023, date last accessed).

Zheng G.X.Y. et al. (2017) Massively parallel digital transcriptional profiling of single cells. Nat. Commun., 8, 14049. PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...