Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium print-electronic
Typ dokumentu časopisecké články
Grantová podpora
001
World Health Organization - International
R21 CA175979
NCI NIH HHS - United States
PubMed
32363341
PubMed Central
PMC7182099
DOI
10.1093/nargab/lqaa021
PII: lqaa021
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
The emergence of next-generation sequencing (NGS) has revolutionized the way of reaching a genome sequence, with the promise of potentially providing a comprehensive characterization of DNA variations. Nevertheless, detecting somatic mutations is still a difficult problem, in particular when trying to identify low abundance mutations, such as subclonal mutations, tumour-derived alterations in body fluids or somatic mutations from histological normal tissue. The main challenge is to precisely distinguish between sequencing artefacts and true mutations, particularly when the latter are so rare they reach similar abundance levels as artefacts. Here, we present needlestack, a highly sensitive variant caller, which directly learns from the data the level of systematic sequencing errors to accurately call mutations. Needlestack is based on the idea that the sequencing error rate can be dynamically estimated from analysing multiple samples together. We show that the sequencing error rate varies across alterations, illustrating the need to precisely estimate it. We evaluate the performance of needlestack for various types of variations, and we show that needlestack is robust among positions and outperforms existing state-of-the-art method for low abundance mutations. Needlestack, along with its source code is freely available on the GitHub platform: https://github.com/IARCbioinfo/needlestack.
Faculty of Health Sciences Palacky University 775 15 Olomouc Czech Republic
International Organization for Cancer Prevention and Research 11070 Belgrade Serbia
Russian N N Blokhin Cancer Research Centre 115478 Moscow The Russian Federation
Zobrazit více v PubMed
Alioto T.S., Buchhalter I., Derdak S., Hutter B., Eldridge M.D., Hovig E., Heisler L.E., Beck T.A., Simpson J.T., Tonon L. et al. .. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 2015; 6:10001. PubMed PMC
Greaves M., Maley C.C.. Clonal evolution in cancer. Nature. 2012; 481:306–313. PubMed PMC
Schwarzenbach H., Hoon D.S., Pantel K.. Cell-free nucleic acids as biomarkers in cancer patients. Nat. Rev. Cancer. 2011; 11:426–437. PubMed
Martincorena I., Fowler J.C., Wabik A., Lawson A.R.J., Abascal F., Hall M.W.J., Cagan A., Murai K., Mahbubani K., Stratton M.R. et al. .. Somatic mutant clones colonize the human esophagus with age. Science (New York, N.Y.). 2018; 362:911–917. PubMed PMC
Bragg L.M., Stone G., Butler M.K., Hugenholtz P., Tyson G.W.. Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data. PLoS Comput. Biol. 2013; 9:e1003031. PubMed PMC
Pfeiffer F., Grober C., Blank M., Handler K., Beyer M., Schultze J.L., Mayer G.. Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci. Rep. 2018; 8:10950. PubMed PMC
Fox E.J., Reid-Bayliss K.S., Emond M.J., Loeb L.A.. Accuracy of next generation sequencing platforms. Next Gen. Seq. Appl. 2014; 1:1000106. PubMed PMC
Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput. Struct. Biotechnol. J. 2018; 16:15–24. PubMed PMC
Gerstung M., Papaemmanuil E., Campbell P.J.. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics. 2014; 30:1198–1204. PubMed PMC
Martincorena I., Roshan A., Gerstung M., Ellis P., Van Loo P., McLaren S., Wedge D.C., Fullam A., Alexandrov L.B., Tubio J.M. et al. .. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science (New York, N.Y.). 2015; 348:880–886. PubMed PMC
Shi W., Ng C.K.Y., Lim R.S., Jiang T., Kumar S., Li X., Wali V.B., Piscuoglio S., Gerstein M.B., Chagpar A.B. et al. .. Reliability of whole-exome sequencing for assessing intratumor genetic heterogeneity. Cell Rep. 2018; 25:1446–1457. PubMed PMC
Di Tommaso P., Chatzou M., Floden E.W., Barja P.P., Palumbo E., Notredame C.. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 2017; 35:316–319. PubMed
Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R.. The sequence alignment/map format and SAMtools. Bioinformatics. 2009; 25:2078–2079. PubMed PMC
Aeberhard W.H., Cantoni E., Heritier S.. Robust inference in the negative binomial regression model with an application to falls data. Biometrics. 2014; 70:920–931. PubMed
Benjamini Y., Hochberg Y.. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. B. 1995; 57:289–300.
George J., Lim J.S., Jang S.J., Cun Y., Ozretic L., Kong G., Leenders F., Lu X., Fernandez-Cuesta L., Bosco G. et al. .. Comprehensive genomic profiles of small cell lung cancer. Nature. 2015; 524:47–53. PubMed PMC
Cancer Genome Atlas Research Network Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012; 489:519–525. PubMed PMC
Ewing A.D., Houlahan K.E., Hu Y., Ellrott K., Caloian C., Yamaguchi T.N., Bare J.C., P’ng C., Waggott D., Sabelnykova V.Y. et al. .. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods. 2015; 12:623–630. PubMed PMC
Poplin R., Ruano-Rubio V., DePristo M.A., Fennell T.J., Carneiro M.O., Van der Auwera G.A., Kling D.E., Gauthier L.D., Levy-Moonshine A., Roazen D. et al. .. Scaling accurate genetic variant discovery to tens of thousands of samples. 2018; bioRxiv doi:24 July 2018, preprint: not peer reviewed10.1101/201178. DOI
Fernandez-Cuesta L., Perdomo S., Avogbe P.H., Leblay N., Delhomme T.M., Gaborieau V., Abedi-Ardekani B., Chanudet E., Olivier M., Zaridze D. et al. .. Identification of circulating tumor DNA for the early detection of small-cell lung cancer. Ebiomedicine. 2016; 10:117–123. PubMed PMC
Chen L., Liu P., Evans T.C. Jr., Ettwiller L.M.. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science (New York, N.Y.). 2017; 355:752–756. PubMed
Laehnemann D., Borkhardt A., McHardy A.C.. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief. Bioinform. 2016; 17:154–179. PubMed PMC
Stephens Z.D., Hudson M.E., Mainzer L.S., Taschuk M., Weber M.R., Iyer R.K.. Simulating next-generation sequencing datasets from empirical mutation and sequencing models. PLoS One. 2016; 11:e0167047. PubMed PMC
Ioannidis N.M., Rothstein J.H., Pejaver V., Middha S., McDonnell S.K., Baheti S., Musolf A., Li Q., Holzinger E., Karyadi D. et al. .. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 2016; 99:877–885. PubMed PMC
Nong J., Gong Y., Guan Y., Yi X., Yi Y., Chang L., Yang L., Lv J., Guo Z., Jia H. et al. .. Circulating tumor DNA analysis depicts subclonal architecture and genomic evolution of small cell lung cancer. Nat. Commun. 2018; 9:3114. PubMed PMC
LaFramboise T. Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic Acids Res. 2009; 37:4181–4193. PubMed PMC
Allhoff M., Schonhuth A., Martin M., Costa I.G., Rahmann S., Marschall T.. Discovering motifs that induce sequencing errors. BMC Bioinformatics. 2013; 14(Suppl. 5):S1. PubMed PMC
Wan J., Massie C., Garcia-Corbacho J., Mouliere F., Brenton J. D., Caldas C., Pacey S., Baird R., Rosenfeld N.. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer. 2017; 17:223–238. PubMed
Zook J.M., McDaniel J., Olson N.D., Wagner J., Parikh H., Heaton H., Irvine S.A., Trigg L., Truty R., McLean C.Y. et al. .. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 2019; 37:561–566. PubMed PMC
Mose L.E., Wilkerson M.D., Hayes D.N., Perou C.M., Parker J.S.. ABRA: improved coding indel detection via assembly-based realignment. Bioinformatics. 2014; 30:2813–2815. PubMed PMC
Kivioja T., Vaharautio A., Karlsson K., Bonke M., Enge M., Linnarsson S., Taipale J.. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods. 2011; 9:72–74. PubMed
Ravasio V., Ritelli M., Legati A., Giacopuzzi E.. GARFIELD-NGS: genomic vARiants filtering by dEep learning moDels in NGS. Bioinformatics. 2018; 34:3038–3040. PubMed
Boettiger C. An introduction to Docker for reproducible research. SIGOPS Oper. Syst. Rev. 2015; 49:71–79.
Kurtzer G.M., Sochat V., Bauer M.W.. Singularity: scientific containers for mobility of compute. PLoS One. 2017; 12:e0177459. PubMed PMC