ProcaryaSV: structural variation detection pipeline for bacterial genomes using short-read sequencing
Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
GA23-05845S
Czech Science Foundation
PubMed
38982375
PubMed Central
PMC11234778
DOI
10.1186/s12859-024-05843-1
PII: 10.1186/s12859-024-05843-1
Knihovny.cz E-zdroje
- Klíčová slova
- Bacteria, CNV, Copy number variation, Pipeline, SV, Structural variation,
- MeSH
- Bacteria genetika MeSH
- genom bakteriální * MeSH
- sekvenční analýza DNA metody MeSH
- software * MeSH
- vysoce účinné nukleotidové sekvenování * metody MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Structural variations play an important role in bacterial genomes. They can mediate genome adaptation quickly in response to the external environment and thus can also play a role in antibiotic resistance. The detection of structural variations in bacteria is challenging, and the recognition of even small rearrangements can be important. Even though most detection tools are aimed at and benchmarked on eukaryotic genomes, they can also be used on prokaryotic genomes. The key features of detection are the ability to detect small rearrangements and support haploid genomes. Because of the limiting performance of a single detection tool, combining the detection abilities of multiple tools can lead to more robust results. There are already available workflows for structural variation detection for long-reads technologies and for the detection of single-nucleotide variation and indels, both aimed at bacteria. Yet we are unaware of structural variations detection workflows for the short-reads sequencing platform. Motivated by this gap we created our workflow. Further, we were interested in increasing the detection performance and providing more robust results. RESULTS: We developed an open-source bioinformatics pipeline, ProcaryaSV, for the detection of structural variations in bacterial isolates from paired-end short sequencing reads. Multiple tools, starting with quality control and trimming of sequencing data, alignment to the reference genome, and multiple structural variation detection tools, are integrated. All the partial results are then processed and merged with an in-house merging algorithm. Compared with a single detection approach, ProcaryaSV has improved detection performance and is a reproducible easy-to-use tool. CONCLUSIONS: The ProcaryaSV pipeline provides an integrative approach to structural variation detection from paired-end next-generation sequencing of bacterial samples. It can be easily installed and used on Linux machines. It is publicly available on GitHub at https://github.com/robinjugas/ProcaryaSV .
Zobrazit více v PubMed
Hughes D. Evaluating genome dynamics: the constraints on rearrangements within bacterial genomes. Genome Biol. 2000;1(6):0006. doi: 10.1186/gb-2000-1-6-reviews0006. PubMed DOI PMC
Noureen M, Tada I, Kawashima T, Arita M. Rearrangement analysis of multiple bacterial genomes. BMC Bioinform. 2019;20(23):23. doi: 10.1186/s12859-019-3293-4. PubMed DOI PMC
West PT, Chanin RB, Bhatt AS. From genome structure to function: insights into structural variation in microbiology. Curr Opin Microbiol. 2022;69:102192. doi: 10.1016/j.mib.2022.102192. PubMed DOI PMC
Firrao G, et al. Genomic structural variations affecting virulence during clonal expansion of Pseudomonas syringae pv. actinidiae Biovar 3 in Europe. Front Microbiol. 2018 doi: 10.3389/fmicb.2018.00656. PubMed DOI PMC
Seferbekova Z, et al. High rates of genome rearrangements and pathogenicity of Shigella spp. Front Microbiol. 2021 doi: 10.3389/fmicb.2021.628622. PubMed DOI PMC
Slack A, Thornton PC, Magner DB, Rosenberg SM, Hastings PJ. On the mechanism of gene amplification induced under stress in Escherichia coli. PLoS Genet. 2006;2(4):e48. doi: 10.1371/journal.pgen.0020048. PubMed DOI PMC
Koskiniemi S, Sun S, Berg OG, Andersson DI. Selection-driven gene loss in bacteria. PLoS Genet. 2012;8(6):e1002787. doi: 10.1371/journal.pgen.1002787. PubMed DOI PMC
LeBlanc N, Charles TC. Bacterial genome reductions: tools, applications, and challenges. Front Genome Ed. 2022;4:957289. doi: 10.3389/fgeed.2022.957289. PubMed DOI PMC
Periwal V, Scaria V. Insights into structural variations and genome rearrangements in prokaryotic genomes. Bioinformatics. 2015;31(1):1–9. doi: 10.1093/bioinformatics/btu600. PubMed DOI
Rocha EPC. The organization of the bacterial genome. Annu Rev Genet. 2008;42(1):211–233. doi: 10.1146/annurev.genet.42.110807.091653. PubMed DOI
Wu L, Wang H, Xia Y, Xi R. CNV-BAC: copy number variation detection in bacterial circular genome. Bioinformatics. 2020;36(12):3890–3891. doi: 10.1093/bioinformatics/btaa208. PubMed DOI
Brynildsrud O, Snipen LG, Bohlin J. CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data. Bioinformatics. 2015;31(11):1708–1715. doi: 10.1093/bioinformatics/btv070. PubMed DOI
Jugas R, et al. CNproScan: hybrid CNV detection for bacterial genomes. Genomics. 2021;113(5):3103–3111. doi: 10.1016/j.ygeno.2021.06.040. PubMed DOI
Lin K, Smit S, Bonnema G, Sanchez-Perez G, de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform. 2015;16(5):852–864. doi: 10.1093/bib/bbu047. PubMed DOI
Fang L, Hu J, Wang D, Wang K. NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data. BMC Bioinform. 2018;19(1):180. doi: 10.1186/s12859-018-2207-1. PubMed DOI PMC
Zarate S, et al. Parliament2: accurate structural variant calling at scale. GigaScience. 2020;9:12. doi: 10.1093/gigascience/giaa145. PubMed DOI PMC
Charron P, Kang M. VariantDetective: an accurate all-in-one pipeline for detecting consensus bacterial SNPs and SVs. Bioinformatics. 2024;40(2):66. doi: 10.1093/bioinformatics/btae066. PubMed DOI PMC
Seah YM, et al. In silico evaluation of variant calling methods for bacterial whole-genome sequencing assays. J Clin Microbiol. 2023;61(8):e01842–e1922. doi: 10.1128/jcm.01842-22. PubMed DOI PMC
Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21(6):974–984. doi: 10.1101/gr.114876.110. PubMed DOI PMC
Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014;15(6):R84. doi: 10.1186/gb-2014-15-6-r84. PubMed DOI PMC
Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28(18):i333–i339. doi: 10.1093/bioinformatics/bts378. PubMed DOI PMC
Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25(21):2865–2871. doi: 10.1093/bioinformatics/btp394. PubMed DOI PMC
Rajaby R, et al. INSurVeyor: improving insertion calling from short read sequencing data. Nat Commun. 2023;14(1):1. doi: 10.1038/s41467-023-38870-2. PubMed DOI PMC
Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2522. doi: 10.1093/bioinformatics/bts480. PubMed DOI
Andrews S, “FastQC: a quality control tool for high throughput sequence data 2010. Accessed 28 Dec 2022. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Krueger F, “Trim Galore! A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data” https://www.bioinformatics.babraham.ac.uk/projects/trim_galore. 2012. [Online]. Available: https://github.com/FelixKrueger/TrimGalore.
Li H, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, ArXiv13033997 Q-Bio, May 2013, Accessed 15 Apr 2022. Available: http://arxiv.org/abs/1303.3997.
Danecek P, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):8. doi: 10.1093/gigascience/giab008. PubMed DOI PMC
Pockrandt C, Alzamel M, Iliopoulos CS, Reinert K. GenMap: ultra-fast computation of genome mappability. Bioinforma Oxf Engl. 2020;36(12):3687–3692. doi: 10.1093/bioinformatics/btaa222. PubMed DOI PMC
Jeffares DC, et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017;8(1):1. doi: 10.1038/ncomms14061. PubMed DOI PMC
English AC, Menon VK, Gibbs RA, Metcalf GA, Sedlazeck FJ. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 2022;23(1):1. doi: 10.1186/s13059-022-02840-6. PubMed DOI PMC
Kirsche M, et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat Methods. 2023;20(3):3. doi: 10.1038/s41592-022-01753-3. PubMed DOI PMC
Mohiyuddin M, et al. MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics. 2015;31(16):2741–2744. doi: 10.1093/bioinformatics/btv204. PubMed DOI PMC
Faust G, “GregoryFaust/SVsim.” Jun. 29, 2022. Accessed 17 Jan 2024. Available: https://github.com/GregoryFaust/SVsim
Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics. 2012;28(4):593–594. doi: 10.1093/bioinformatics/btr708. PubMed DOI PMC
Copin R, et al. Sequential evolution of virulence and resistance during clonal spread of community-acquired methicillin-resistant Staphylococcus aureus. Proc Natl Acad Sci. 2019;116(5):1745–1754. doi: 10.1073/pnas.1814265116. PubMed DOI PMC
Wang J, et al. Genome adaptive evolution of Lactobacillus casei under long-term antibiotic selection pressures. BMC Genom. 2017;18(1):320. doi: 10.1186/s12864-017-3710-x. PubMed DOI PMC
Bezdicek M, et al. Application of mini-MLST and whole genome sequencing in low diversity hospital extended-spectrum beta-lactamase producing Klebsiella pneumoniae population. PLoS ONE. 2019;14(8):1–14. doi: 10.1371/journal.pone.0221187. PubMed DOI PMC
Eisfeldt J, Vezzi F, Olason P, Nilsson D, Lindstrand A. TIDDIT, an efficient and comprehensive structural variant caller for massive parallel sequencing data. F1000Research. 2017;6:664. doi: 10.12688/f1000research.11168.2. PubMed DOI PMC
Fan X, Abbott TE, Larson D, Chen K. BreakDancer—identification of genomic structural variation from paired-end read mapping. Curr Protoc Bioinforma Board Andreas Baxevanis Al. 2014 doi: 10.1002/0471250953.bi1506s45. PubMed DOI PMC
Chen X, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinforma Oxf Engl. 2016;32(8):1220–1222. doi: 10.1093/bioinformatics/btv710. PubMed DOI
Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics. 2017;33(18):2938–2940. doi: 10.1093/bioinformatics/btx364. PubMed DOI PMC
Arslan S, et al. Sequencing by avidity enables high accuracy with low reagent consumption. Nat Biotechnol. 2024;42(1):132–138. doi: 10.1038/s41587-023-01750-7. PubMed DOI PMC