• This record comes from PubMed

digIS: towards detecting distant and putative novel insertion sequence elements in prokaryotic genomes

. 2021 May 20 ; 22 (1) : 258. [epub] 20210520

Language English Country Great Britain, England Media electronic

Document type Journal Article

Links

PubMed 34016050
PubMed Central PMC8147514
DOI 10.1186/s12859-021-04177-6
PII: 10.1186/s12859-021-04177-6
Knihovny.cz E-resources

BACKGROUND: The insertion sequence elements (IS elements) represent the smallest and the most abundant mobile elements in prokaryotic genomes. It has been shown that they play a significant role in genome organization and evolution. To better understand their function in the host genome, it is desirable to have an effective detection and annotation tool. This need becomes even more crucial when considering rapid-growing genomic and metagenomic data. The existing tools for IS elements detection and annotation are usually based on comparing sequence similarity with a database of known IS families. Thus, they have limited ability to discover distant and putative novel IS elements. RESULTS: In this paper, we present digIS, a software tool based on profile hidden Markov models assembled from catalytic domains of transposases. It shows a very good performance in detecting known IS elements when tested on datasets with manually curated annotation. The main contribution of digIS is in its ability to detect distant and putative novel IS elements while maintaining a moderate level of false positives. In this category it outperforms existing tools, especially when tested on large datasets of archaeal and bacterial genomes. CONCLUSION: We provide digIS, a software tool using a novel approach based on manually curated profile hidden Markov models, which is able to detect distant and putative novel IS elements. Although digIS can find known IS elements as well, we expect it to be used primarily by scientists interested in finding novel IS elements. The tool is available at https://github.com/janka2012/digIS.

See more in PubMed

Siguier P, Gourbeyre E, Chandler M. Bacterial insertion sequences: their genomic impact and diversity. FEMS Microbiol Rev. 2014;38(5):865–891. doi: 10.1111/1574-6976.12067. PubMed DOI PMC

Vandecraen J, Chandler M, Aertsen A, Van Houdt R. The impact of insertion sequences on bacterial genome plasticity and adaptability. Crit Rev Microbiol. 2017;43(6):709–730. doi: 10.1080/1040841X.2017.1303661. PubMed DOI

Siguier PI. The reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006;34(90001):32–36. doi: 10.1093/nar/gkj014. PubMed DOI PMC

Kichenaradja P, Siguier P, Pérochon J, Chandler M. ISbrowser: an extension of ISfinder for visualizing insertion sequences in prokaryotic genomes. Nucleic Acids Res. 2009;38(SUPPL.1):62–68. doi: 10.1093/nar/gkp947. PubMed DOI PMC

Leplae R, Lima-Mendez G, Toussaint A. ACLAME: a classification of mobile genetic elements, update 2010. Nucleic Acids Res. 2010;38(suppl–1):57–61. doi: 10.1093/nar/gkp938. PubMed DOI PMC

Biswas A, Gauthier DT, Ranjan D, Zubair M. ISQuest: finding insertion sequences in prokaryotic sequence fragment data. Bioinformatics. 2015;31(21):3406–3412. doi: 10.1093/bioinformatics/btv388. PubMed DOI

Hawkey J, et al. ISMapper: identifying transposase insertion sites in bacterial genomes from short read sequence data. BMC Genom. 2015;16(1):1–11. doi: 10.1186/s12864-015-1860-2. PubMed DOI PMC

Wright MS, Bishop B, Adams MD. Quantitative assessment of insertion sequence impact on bacterial genome architecture. Microbial Genomics. 2016 doi: 10.1099/mgen.0.000062. PubMed DOI PMC

Treepong P, Guyeux C, Meunier A, Couchoud C, Hocquet D, Valot B. panISa: Ab initio detection of insertion sequences in bacterial genomes from short read sequence data. Bioinformatics. 2018;34(22):3795–3800. doi: 10.1093/bioinformatics/bty479. PubMed DOI

Wagner A, Lewis C, Bichsel M. A survey of bacterial insertion sequences using IScan. Nucleic Acids Res. 2007;35(16):5284–5293. doi: 10.1093/nar/gkm597. PubMed DOI PMC

Varani AM, Siguier P, Gourbeyre E, Charneau V, Chandler M. ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes. Genome Biol. 2011;12(3):30. doi: 10.1186/gb-2011-12-3-r30. PubMed DOI PMC

Robinson DG, Lee M-C, Marx CJ. OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences. Nucleic Acids Res. 2012;40(22):174. doi: 10.1093/nar/gks778. PubMed DOI PMC

Xie Z, Tang H. ISEScan: automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics. 2017;33(21):3340–3347. doi: 10.1093/bioinformatics/btx433. PubMed DOI

Riadi G, Medina-Moenne C, Holmes DS. TnpPred: a web service for the robust prediction of prokaryotic transposases. Comp Funct Genomics. 2012;2012:678761. doi: 10.1155/2012/678761. PubMed DOI PMC

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. PubMed DOI

Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23(6):673–679. doi: 10.1093/bioinformatics/btm009. PubMed DOI PMC

Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013;41(12):121. doi: 10.1093/nar/gkt263. PubMed DOI PMC

Cock PJA, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–1423. doi: 10.1093/bioinformatics/btp163. PubMed DOI PMC

Sievers F, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011 doi: 10.1038/msb.2011.75. PubMed DOI PMC

Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25(9):1189–1191. doi: 10.1093/bioinformatics/btp033. PubMed DOI PMC

Drozdetskiy A, Cole C, Procter J, Barton GJ. JPred4: a protein secondary structure prediction server. Nucleic Acids Res. 2015;43(W1):389–94. doi: 10.1093/nar/gkv332. PubMed DOI PMC

Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A. UniProtKB/Swiss-Prot. Methods Mol Biol (Clifton, NJ) 2007;406:89–112. PubMed

O’Leary NA, et al. Reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016;44(D1):733–745. doi: 10.1093/nar/gkv1189. PubMed DOI PMC

El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer ELL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE, Finn RD. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47(D1):427–432. doi: 10.1093/nar/gky995. PubMed DOI PMC

Majorek KA, et al. The RNase H-like superfamily: new members, comparative structural analysis and evolutionary classification. Nucleic Acids Res. 2014;42(7):4160–4179. doi: 10.1093/nar/gkt1414. PubMed DOI PMC

Haft DH, et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 2018;46(D1):851–860. doi: 10.1093/nar/gkx1068. PubMed DOI PMC

Smith MCM, Thorpe HM. Diversity in the serine recombinases. Mol Microbiol. 2002;44(2):299–307. doi: 10.1046/j.1365-2958.2002.02891.x. PubMed DOI

Boocock MR, Rice PA. A proposed mechanism for IS607-family serine transposases. Mobile DNA. 2013;4(1):24. doi: 10.1186/1759-8753-4-24. PubMed DOI PMC

Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–842. doi: 10.1093/bioinformatics/btq033. PubMed DOI PMC

Hayashi K, et al. Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W3110. Mol Syst Biol. 2006 doi: 10.1038/msb4100049. PubMed DOI PMC

Zhou J, Rudd KE. EcoGene 30. Nucleic Acids Res. 2013 doi: 10.1093/nar/gks1235. PubMed DOI PMC

Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2019;47(D1):94–99. doi: 10.1093/nar/gky989. PubMed DOI PMC

Jiang Q, Jin X, Lee S-J, Yao S. Protein secondary structure prediction: a survey of the state of the art. J Mol Graph Model. 2017;76:379–402. doi: 10.1016/j.jmgm.2017.07.015. PubMed DOI

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...