ProcaryaSV: structural variation detection pipeline for bacterial genomes using short-read sequencing

. 2024 Jul 09 ; 25 (1) : 233. [epub] 20240709

Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid38982375

Grantová podpora
GA23-05845S Czech Science Foundation

Odkazy

PubMed 38982375
PubMed Central PMC11234778
DOI 10.1186/s12859-024-05843-1
PII: 10.1186/s12859-024-05843-1
Knihovny.cz E-zdroje

BACKGROUND: Structural variations play an important role in bacterial genomes. They can mediate genome adaptation quickly in response to the external environment and thus can also play a role in antibiotic resistance. The detection of structural variations in bacteria is challenging, and the recognition of even small rearrangements can be important. Even though most detection tools are aimed at and benchmarked on eukaryotic genomes, they can also be used on prokaryotic genomes. The key features of detection are the ability to detect small rearrangements and support haploid genomes. Because of the limiting performance of a single detection tool, combining the detection abilities of multiple tools can lead to more robust results. There are already available workflows for structural variation detection for long-reads technologies and for the detection of single-nucleotide variation and indels, both aimed at bacteria. Yet we are unaware of structural variations detection workflows for the short-reads sequencing platform. Motivated by this gap we created our workflow. Further, we were interested in increasing the detection performance and providing more robust results. RESULTS: We developed an open-source bioinformatics pipeline, ProcaryaSV, for the detection of structural variations in bacterial isolates from paired-end short sequencing reads. Multiple tools, starting with quality control and trimming of sequencing data, alignment to the reference genome, and multiple structural variation detection tools, are integrated. All the partial results are then processed and merged with an in-house merging algorithm. Compared with a single detection approach, ProcaryaSV has improved detection performance and is a reproducible easy-to-use tool. CONCLUSIONS: The ProcaryaSV pipeline provides an integrative approach to structural variation detection from paired-end next-generation sequencing of bacterial samples. It can be easily installed and used on Linux machines. It is publicly available on GitHub at https://github.com/robinjugas/ProcaryaSV .

Zobrazit více v PubMed

Hughes D. Evaluating genome dynamics: the constraints on rearrangements within bacterial genomes. Genome Biol. 2000;1(6):0006. doi: 10.1186/gb-2000-1-6-reviews0006. PubMed DOI PMC

Noureen M, Tada I, Kawashima T, Arita M. Rearrangement analysis of multiple bacterial genomes. BMC Bioinform. 2019;20(23):23. doi: 10.1186/s12859-019-3293-4. PubMed DOI PMC

West PT, Chanin RB, Bhatt AS. From genome structure to function: insights into structural variation in microbiology. Curr Opin Microbiol. 2022;69:102192. doi: 10.1016/j.mib.2022.102192. PubMed DOI PMC

Firrao G, et al. Genomic structural variations affecting virulence during clonal expansion of Pseudomonas syringae pv. actinidiae Biovar 3 in Europe. Front Microbiol. 2018 doi: 10.3389/fmicb.2018.00656. PubMed DOI PMC

Seferbekova Z, et al. High rates of genome rearrangements and pathogenicity of Shigella spp. Front Microbiol. 2021 doi: 10.3389/fmicb.2021.628622. PubMed DOI PMC

Slack A, Thornton PC, Magner DB, Rosenberg SM, Hastings PJ. On the mechanism of gene amplification induced under stress in Escherichia coli. PLoS Genet. 2006;2(4):e48. doi: 10.1371/journal.pgen.0020048. PubMed DOI PMC

Koskiniemi S, Sun S, Berg OG, Andersson DI. Selection-driven gene loss in bacteria. PLoS Genet. 2012;8(6):e1002787. doi: 10.1371/journal.pgen.1002787. PubMed DOI PMC

LeBlanc N, Charles TC. Bacterial genome reductions: tools, applications, and challenges. Front Genome Ed. 2022;4:957289. doi: 10.3389/fgeed.2022.957289. PubMed DOI PMC

Periwal V, Scaria V. Insights into structural variations and genome rearrangements in prokaryotic genomes. Bioinformatics. 2015;31(1):1–9. doi: 10.1093/bioinformatics/btu600. PubMed DOI

Rocha EPC. The organization of the bacterial genome. Annu Rev Genet. 2008;42(1):211–233. doi: 10.1146/annurev.genet.42.110807.091653. PubMed DOI

Wu L, Wang H, Xia Y, Xi R. CNV-BAC: copy number variation detection in bacterial circular genome. Bioinformatics. 2020;36(12):3890–3891. doi: 10.1093/bioinformatics/btaa208. PubMed DOI

Brynildsrud O, Snipen LG, Bohlin J. CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data. Bioinformatics. 2015;31(11):1708–1715. doi: 10.1093/bioinformatics/btv070. PubMed DOI

Jugas R, et al. CNproScan: hybrid CNV detection for bacterial genomes. Genomics. 2021;113(5):3103–3111. doi: 10.1016/j.ygeno.2021.06.040. PubMed DOI

Lin K, Smit S, Bonnema G, Sanchez-Perez G, de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform. 2015;16(5):852–864. doi: 10.1093/bib/bbu047. PubMed DOI

Fang L, Hu J, Wang D, Wang K. NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data. BMC Bioinform. 2018;19(1):180. doi: 10.1186/s12859-018-2207-1. PubMed DOI PMC

Zarate S, et al. Parliament2: accurate structural variant calling at scale. GigaScience. 2020;9:12. doi: 10.1093/gigascience/giaa145. PubMed DOI PMC

Charron P, Kang M. VariantDetective: an accurate all-in-one pipeline for detecting consensus bacterial SNPs and SVs. Bioinformatics. 2024;40(2):66. doi: 10.1093/bioinformatics/btae066. PubMed DOI PMC

Seah YM, et al. In silico evaluation of variant calling methods for bacterial whole-genome sequencing assays. J Clin Microbiol. 2023;61(8):e01842–e1922. doi: 10.1128/jcm.01842-22. PubMed DOI PMC

Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21(6):974–984. doi: 10.1101/gr.114876.110. PubMed DOI PMC

Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014;15(6):R84. doi: 10.1186/gb-2014-15-6-r84. PubMed DOI PMC

Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28(18):i333–i339. doi: 10.1093/bioinformatics/bts378. PubMed DOI PMC

Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25(21):2865–2871. doi: 10.1093/bioinformatics/btp394. PubMed DOI PMC

Rajaby R, et al. INSurVeyor: improving insertion calling from short read sequencing data. Nat Commun. 2023;14(1):1. doi: 10.1038/s41467-023-38870-2. PubMed DOI PMC

Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2522. doi: 10.1093/bioinformatics/bts480. PubMed DOI

Andrews S, “FastQC: a quality control tool for high throughput sequence data 2010. Accessed 28 Dec 2022. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

Krueger F, “Trim Galore! A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data” https://www.bioinformatics.babraham.ac.uk/projects/trim_galore. 2012. [Online]. Available: https://github.com/FelixKrueger/TrimGalore.

Li H, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, ArXiv13033997 Q-Bio, May 2013, Accessed 15 Apr 2022. Available: http://arxiv.org/abs/1303.3997.

Danecek P, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):8. doi: 10.1093/gigascience/giab008. PubMed DOI PMC

Pockrandt C, Alzamel M, Iliopoulos CS, Reinert K. GenMap: ultra-fast computation of genome mappability. Bioinforma Oxf Engl. 2020;36(12):3687–3692. doi: 10.1093/bioinformatics/btaa222. PubMed DOI PMC

Jeffares DC, et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017;8(1):1. doi: 10.1038/ncomms14061. PubMed DOI PMC

English AC, Menon VK, Gibbs RA, Metcalf GA, Sedlazeck FJ. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 2022;23(1):1. doi: 10.1186/s13059-022-02840-6. PubMed DOI PMC

Kirsche M, et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat Methods. 2023;20(3):3. doi: 10.1038/s41592-022-01753-3. PubMed DOI PMC

Mohiyuddin M, et al. MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics. 2015;31(16):2741–2744. doi: 10.1093/bioinformatics/btv204. PubMed DOI PMC

Faust G, “GregoryFaust/SVsim.” Jun. 29, 2022. Accessed 17 Jan 2024. Available: https://github.com/GregoryFaust/SVsim

Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics. 2012;28(4):593–594. doi: 10.1093/bioinformatics/btr708. PubMed DOI PMC

Copin R, et al. Sequential evolution of virulence and resistance during clonal spread of community-acquired methicillin-resistant Staphylococcus aureus. Proc Natl Acad Sci. 2019;116(5):1745–1754. doi: 10.1073/pnas.1814265116. PubMed DOI PMC

Wang J, et al. Genome adaptive evolution of Lactobacillus casei under long-term antibiotic selection pressures. BMC Genom. 2017;18(1):320. doi: 10.1186/s12864-017-3710-x. PubMed DOI PMC

Bezdicek M, et al. Application of mini-MLST and whole genome sequencing in low diversity hospital extended-spectrum beta-lactamase producing Klebsiella pneumoniae population. PLoS ONE. 2019;14(8):1–14. doi: 10.1371/journal.pone.0221187. PubMed DOI PMC

Eisfeldt J, Vezzi F, Olason P, Nilsson D, Lindstrand A. TIDDIT, an efficient and comprehensive structural variant caller for massive parallel sequencing data. F1000Research. 2017;6:664. doi: 10.12688/f1000research.11168.2. PubMed DOI PMC

Fan X, Abbott TE, Larson D, Chen K. BreakDancer—identification of genomic structural variation from paired-end read mapping. Curr Protoc Bioinforma Board Andreas Baxevanis Al. 2014 doi: 10.1002/0471250953.bi1506s45. PubMed DOI PMC

Chen X, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinforma Oxf Engl. 2016;32(8):1220–1222. doi: 10.1093/bioinformatics/btv710. PubMed DOI

Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics. 2017;33(18):2938–2940. doi: 10.1093/bioinformatics/btx364. PubMed DOI PMC

Arslan S, et al. Sequencing by avidity enables high accuracy with low reagent consumption. Nat Biotechnol. 2024;42(1):132–138. doi: 10.1038/s41587-023-01750-7. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...