Validated WGS and WES protocols proved saliva-derived gDNA as an equivalent to blood-derived gDNA for clinical and population genomic analyses

. 2024 Feb 17 ; 25 (1) : 187. [epub] 20240217

Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid38365587
Odkazy

PubMed 38365587
PubMed Central PMC10873937
DOI 10.1186/s12864-024-10080-0
PII: 10.1186/s12864-024-10080-0
Knihovny.cz E-zdroje

BACKGROUND: Whole exome sequencing (WES) and whole genome sequencing (WGS) have become standard methods in human clinical diagnostics as well as in population genomics (POPGEN). Blood-derived genomic DNA (gDNA) is routinely used in the clinical environment. Conversely, many POPGEN studies and commercial tests benefit from easy saliva sampling. Here, we evaluated the quality of variant call sets and the level of genotype concordance of single nucleotide variants (SNVs) and small insertions and deletions (indels) for WES and WGS using paired blood- and saliva-derived gDNA isolates employing genomic reference-based validated protocols. METHODS: The genomic reference standard Coriell NA12878 was repeatedly analyzed using optimized WES and WGS protocols, and data calls were compared with the truth dataset published by the Genome in a Bottle Consortium. gDNA was extracted from the paired blood and saliva samples of 10 participants and processed using the same protocols. A comparison of paired blood-saliva call sets was performed in the context of WGS and WES genomic reference-based technical validation results. RESULTS: The quality pattern of called variants obtained from genomic-reference-based technical replicates correlates with data calls of paired blood-saliva-derived samples in all levels of tested examinations despite a higher rate of non-human contamination found in the saliva samples. The F1 score of 10 blood-to-saliva-derived comparisons ranged between 0.8030-0.9998 for SNVs and between 0.8883-0.9991 for small-indels in the case of the WGS protocol, and between 0.8643-0.999 for SNVs and between 0.7781-1.000 for small-indels in the case of the WES protocol. CONCLUSION: Saliva may be considered an equivalent material to blood for genetic analysis for both WGS and WES under strict protocol conditions. The accuracy of sequencing metrics and variant-detection accuracy is not affected by choosing saliva as the gDNA source instead of blood but much more significantly by the genomic context, variant types, and the sequencing technology used.

Zobrazit více v PubMed

100,000 Genomes Project Pilot Investigators. Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, et al. 100,000 genomes pilot on rare-disease diagnosis in health care - preliminary report. N Engl J Med. 2021;385:1868–80. doi: 10.1056/NEJMoa2035790. PubMed DOI PMC

Bick D, Jones M, Taylor SL, Taft RJ, Belmont J. Case for genome sequencing in infants and children with rare, undiagnosed or genetic diseases. J Med Genet. 2019;56:783–791. doi: 10.1136/jmedgenet-2019-106111. PubMed DOI PMC

Owen MJ, Wright MS, Batalov S, Kwon Y, Ding Y, Chau KK, et al. Reclassification of the etiology of infant mortality with whole-genome sequencing. JAMA Netw Open. 2023;6:e2254069. doi: 10.1001/jamanetworkopen.2022.54069. PubMed DOI PMC

Lee H-F, Chi C-S, Tsai C-R. Diagnostic yield and treatment impact of whole-genome sequencing in paediatric neurological disorders. Dev Med Child Neurol. 2021;63:934–938. doi: 10.1111/dmcn.14722. PubMed DOI

Souche E, Beltran S, Brosens E, Belmont JW, Fossum M, Riess O, et al. Recommendations for whole genome sequencing in diagnostics for rare diseases. Eur J Hum Genet. 2022;30:1017–1021. doi: 10.1038/s41431-022-01113-x. PubMed DOI PMC

Matthijs G, Souche E, Alders M, Corveleyn A, Eck S, Feenstra I, et al. Guidelines for diagnostic next-generation sequencing. Eur J Hum Genet. 2016;24:2–5. doi: 10.1038/ejhg.2015.226. PubMed DOI PMC

Schwarze K, Buchanan J, Taylor JC, Wordsworth S. Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature. Genet Med. 2018;20:1122–1130. doi: 10.1038/gim.2017.247. PubMed DOI

Aaltio J, Hyttinen V, Kortelainen M, Frederix GWJ, Lönnqvist T, Suomalainen A, et al. Cost-effectiveness of whole-exome sequencing in progressive neurological disorders of children. Eur J Paediatr Neurol. 2022;36:30–36. doi: 10.1016/j.ejpn.2021.11.006. PubMed DOI

Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, et al. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell. 2023;186:923–39.e14. doi: 10.1016/j.cell.2023.01.042. PubMed DOI PMC

Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell. 2019;177:26–31. doi: 10.1016/j.cell.2019.02.048. PubMed DOI PMC

Phulka JS, Ashraf M, Bajwa BK, Pare G, Laksman Z. Current state and future of polygenic risk scores in cardiometabolic disease: a scoping review. Circ Genom Precis Med. 2023;16:286–313. doi: 10.1161/CIRCGEN.122.003834. PubMed DOI

Sherkow JS, Park JK, Lu CY. Regulating direct-to-consumer polygenic risk scores. JAMA. 2023 doi: 10.1001/jama.2023.12262. PubMed DOI

Herzig AF, Velo-Suárez L, Le Folgoc G, Boland A, Blanché H, Olaso R, et al. Evaluation of saliva as a source of accurate whole-genome and microbiome sequencing data. Genet Epidemiol. 2021;45:537–548. doi: 10.1002/gepi.22386. PubMed DOI

Gudiseva HV, Hansen M, Gutierrez L, Collins DW, He J, Verkuil LD, et al. Saliva DNA quality and genotyping efficiency in a predominantly elderly population. BMC Med Genomics. 2016;9:17. doi: 10.1186/s12920-016-0172-y. PubMed DOI PMC

Poehls UG, Hack CC, Ekici AB, Beckmann MW, Fasching PA, Ruebner M, et al. Saliva samples as a source of DNA for high throughput genotyping: an acceptable and sufficient means in improvement of risk estimation throughout mammographic diagnostics. Eur J Med Res. 2018;23:20. doi: 10.1186/s40001-018-0318-9. PubMed DOI PMC

Bruinsma FJ, Joo JE, Wong EM, Giles GG, Southey MC. The utility of DNA extracted from saliva for genome-wide molecular research platforms. BMC Res Notes. 2018;11:8. doi: 10.1186/s13104-017-3110-y. PubMed DOI PMC

Kidd JM, Sharpton TJ, Bobo D, Norman PJ, Martin AR, Carpenter ML, et al. Exome capture from saliva produces high quality genomic and metagenomic data. BMC Genomics. 2014;15:262. doi: 10.1186/1471-2164-15-262. PubMed DOI PMC

Zhu Q, Hu Q, Shepherd L, Wang J, Wei L, Morrison CD, et al. The impact of DNA input amount and DNA source on the performance of whole-exome sequencing in cancer epidemiology. Cancer Epidemiol Biomarkers Prev. 2015;24:1207–1213. doi: 10.1158/1055-9965.EPI-15-0205. PubMed DOI PMC

Trost B, Walker S, Haider SA, Sung WWL, Pereira S, Phillips CL, et al. Impact of DNA source on genetic variant detection from human whole-genome sequencing data. J Med Genet. 2019;56:809–817. doi: 10.1136/jmedgenet-2019-106281. PubMed DOI PMC

Samson CA, Whitford W, Snell RG, Jacobsen JC, Lehnert K. Contaminating DNA in human saliva alters the detection of variants from whole genome sequencing. Sci Rep. 2020;10:19255. doi: 10.1038/s41598-020-76022-4. PubMed DOI PMC

Yao RA, Akinrinade O, Chaix M, Mital S. Quality of whole genome sequencing from blood versus saliva derived DNA in cardiac patients. BMC Med Genomics. 2020;13:11. doi: 10.1186/s12920-020-0664-7. PubMed DOI PMC

Zhao S, Agafonov O, Azab A, Stokowy T, Hovig E. Accuracy and efficiency of germline variant calling pipelines for human genome data. Sci Rep. 2020;10:20222. doi: 10.1038/s41598-020-77218-4. PubMed DOI PMC

Marshall CR, Chowdhury S, Taft RJ, Lebo MS, Buchan JG, Harrison SM, et al. Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease. NPJ Genom Med. 2020;5:47. doi: 10.1038/s41525-020-00154-9. PubMed DOI PMC

Zare F, Ansari S, Najarian K, Nabavi S. Preprocessing sequence coverage data for more precise detection of copy number variations. IEEE/ACM Trans Comput Biol Bioinform. 2020;17:868–876. doi: 10.1109/TCBB.2018.2869738. PubMed DOI PMC

Rajagopalan R, Murrell JR, Luo M, Conlin LK. A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data. Genome Med. 2020;12:14. doi: 10.1186/s13073-020-0712-0. PubMed DOI PMC

Budiš J, Kucharík M, Ďuriš F, Gazdarica J, Zrubcová M, Ficek A, et al. Dante: genotyping of known complex and expanded short tandem repeats. Bioinformatics. 2019;35:1310–1317. doi: 10.1093/bioinformatics/bty791. PubMed DOI

Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, et al. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol. 2019;37:561–566. doi: 10.1038/s41587-019-0074-6. PubMed DOI PMC

Index of /ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38. https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/. Accessed 16 Aug 2023.

Krusche P, Trigg L, Boutros PC, Mason CE, De La Vega FM, Moore BL, et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol. 2019;37:555–560. doi: 10.1038/s41587-019-0054-x. PubMed DOI PMC

Index of /ReferenceSamples/giab/release/genome-stratifications/v3.0/GRCh38/union. https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/genome-stratifications/v3.0/GRCh38/union/. Accessed 16 Aug 2023.

NA12878. https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA12878. Accessed 22 Aug 2023.

Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018;7:1338. doi: 10.12688/f1000research.15931.1. PubMed DOI PMC

Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. PubMed DOI PMC

Illumina DRAGEN Bio-IT Platform v3.10. https://support-docs.illumina.com/SW/DRAGEN_v310/Content/SW/FrontPages/DRAGEN.htm. Accessed 3 Dec 2023.

GitHub - pwwang/vcfstats: Powerful statistics for VCF files. GitHub. https://github.com/pwwang/vcfstats. Accessed 11 Jan 2023.

Dewhirst FE, Chen T, Izard J, Paster BJ, Tanner ACR, Yu W-H, et al. The human oral microbiome. J Bacteriol. 2010;192:5002–5017. doi: 10.1128/JB.00542-10. PubMed DOI PMC

Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26:297–302. doi: 10.2307/1932409. DOI

Sørensen T. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons. Kongelige Danske Videnskabernes Selskab, Biologiske Skrifter. 1948;5:1–34.

GitHub - Illumina/hap.py: Haplotype VCF comparison tools. GitHub. https://github.com/Illumina/hap.py. Accessed 16 Jun 2023.

Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–72. doi: 10.1038/s41592-019-0686-2. PubMed DOI PMC

Hunter JD. Matplotlib: A 2D graphics environment. Comput Sci Engine. 2007;9:90–95. doi: 10.1109/MCSE.2007.55. DOI

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020;585:357–362. doi: 10.1038/s41586-020-2649-2. PubMed DOI PMC

pandas-dev/pandas: Pandas. 2023. 10.5281/zenodo.7549438.

Vallat R. Pingouin: statistics in Python. J Open Source Software. 2018;3:1026. doi: 10.21105/joss.01026. DOI

Waskom M. seaborn: statistical data visualization. J Open Source Software. 2021;6:3021. doi: 10.21105/joss.03021. DOI

Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. PubMed DOI PMC

Rehder C, Bean LJH, Bick D, Chao E, Chung W, Das S, et al. Next-generation sequencing for constitutional variants in the clinical laboratory, 2021 revision: a technical standard of the American College of Medical Genetics and Genomics (ACMG) Genet Med. 2021;23:1399–1415. doi: 10.1038/s41436-021-01139-4. PubMed DOI

Wagner J, Olson ND, Harris L, McDaniel J, Cheng H, Fungtammasan A, et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol. 2022;40:672–680. doi: 10.1038/s41587-021-01158-1. PubMed DOI PMC

Kang J-H, Kho H-S. Blood contamination in salivary diagnostics: current methods and their limitations. Clin Chem Lab Med. 2019;57:1115–1124. doi: 10.1515/cclm-2018-0739. PubMed DOI

Theda C, Hwang SH, Czajko A, Loke YJ, Leong P, Craig JM. Quantitation of the cellular content of saliva and buccal swab samples. Sci Rep. 2018;8:6944. doi: 10.1038/s41598-018-25311-0. PubMed DOI PMC

Genome in a Bottle | NIST. 2012. https://www.nist.gov/programs-projects/genome-bottle. Accessed 21 June 2023.

Kubiritova Z, Gyuraszova M, Nagyova E, Hyblova M, Harsanyova M, Budis J, et al. On the critical evaluation and confirmation of germline sequence variants identified using massively parallel sequencing. J Biotechnol. 2019;298:64–75. doi: 10.1016/j.jbiotec.2019.04.013. PubMed DOI

Budis J, Gazdarica J, Radvanszky J, Harsanyova M, Gazdaricova I, Strieskova L, et al. Non-invasive prenatal testing as a valuable source of population specific allelic frequencies. J Biotechnol. 2019;299:72–78. doi: 10.1016/j.jbiotec.2019.04.026. PubMed DOI

Willems T, Zielinski D, Yuan J, Gordon A, Gymrek M, Erlich Y. Genome-wide profiling of heritable and de novo STR variations. Nat Methods. 2017;14:590–592. doi: 10.1038/nmeth.4267. PubMed DOI PMC

Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R, Chen S, et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics. 2019;35:4754–4756. doi: 10.1093/bioinformatics/btz431. PubMed DOI PMC

Kumar A, Skrahina V, Atta J, Boettcher V, Hanig N, Rolfs A, et al. Microbial contamination and composition of oral samples subjected to clinical whole genome sequencing. Front Genet. 2023;14:1081424. doi: 10.3389/fgene.2023.1081424. PubMed DOI PMC

Sender R, Fuchs S, Milo R. Revised estimates for the number of human and bacteria cells in the body. Plos Biol. 2016;14:e1002533. doi: 10.1371/journal.pbio.1002533. PubMed DOI PMC

Castillo DJ, Rifkin RF, Cowan DA, Potgieter M. The Healthy human blood microbiome: fact or fiction? Front Cell Infect Microbiol. 2019;9:148. doi: 10.3389/fcimb.2019.00148. PubMed DOI PMC

Şenel S. An overview of physical, microbiological and immune barriers of oral Mucosa. Int J Mol Sci. 2021;22:7821. doi: 10.3390/ijms22157821. PubMed DOI PMC

Caselli E, Fabbri C, D’Accolti M, Soffritti I, Bassi C, Mazzacane S, et al. Defining the oral microbiome by whole-genome sequencing and resistome analysis: the complexity of the healthy picture. BMC Microbiol. 2020;20:120. doi: 10.1186/s12866-020-01801-y. PubMed DOI PMC

Lee E-J, Sung J, Kim H-L, Kim H-N. Whole-genome sequencing reveals age-specific changes in the human blood microbiota. J Pers Med. 2022;12:939. doi: 10.3390/jpm12060939. PubMed DOI PMC

Peng X, Cheng L, You Y, Tang C, Ren B, Li Y, et al. Oral microbiota in human systematic diseases. Int J Oral Sci. 2022;14:14. doi: 10.1038/s41368-022-00163-7. PubMed DOI PMC

Olomu IN, Pena-Cortes LC, Long RA, Vyas A, Krichevskiy O, Luellwitz R, et al. Elimination of “kitome” and “splashome” contamination results in lack of detection of a unique placental microbiome. BMC Microbiol. 2020;20:157. doi: 10.1186/s12866-020-01839-y. PubMed DOI PMC

Sosonkina N, Kelly M, Holt J, Bick D, Nakouzi G. eP403: finding merit in impurity: designing a cost-effective workflow for saliva genome sequencing. Genet Med. 2022;24:253–4. doi: 10.1016/j.gim.2022.01.438. PubMed DOI

BioProject. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1008349. Accessed 23 Aug 2023.

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...