Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing
Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem
Grantová podpora
HHSN261201500003C
NCI NIH HHS - United States
75N910D00024
NIH HHS - United States
S10 OD019960
NIH HHS - United States
HHSN261201400008C
NCI NIH HHS - United States
HHSN261201500003I
NCI NIH HHS - United States
Z99 CA999999
Intramural NIH HHS - United States
S10OD019960
U.S. Department of Health & Human Services | National Institutes of Health (NIH)
PubMed
34504346
PubMed Central
PMC8506910
DOI
10.1038/s41587-021-00994-5
PII: 10.1038/s41587-021-00994-5
Knihovny.cz E-zdroje
- MeSH
- benchmarking * MeSH
- buněčné linie MeSH
- lidé MeSH
- mutace MeSH
- nádorové buněčné linie MeSH
- nádory genetika patologie MeSH
- reprodukovatelnost výsledků MeSH
- sekvenční analýza DNA normy MeSH
- sekvenování celého genomu normy MeSH
- sekvenování exomu normy MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
AstraZeneca Gaithersburg MD USA
Bioinformatics Research and Early Development Roche Sequencing Solutions Inc Belmont CA USA
Biomarker Development Novartis Institutes for Biomedical Research Basel Switzerland
Center for Genomics Loma Linda University School of Medicine Loma Linda CA USA
Center for Information Technology National Institutes of Health Bethesda MD USA
Computational Genomics Genomics Research Center AbbVie North Chicago IL USA
Department of Physiology and Biophysics Weill Cornell Medicine New York NY USA
Departments of Medicine and Pathology University of Toledo Medical Center Toledo OH USA
Estonian Genome Centre Institute of Genomics University of Tartu Tartu Estonia
European Infrastructure for Translational Medicine Amsterdam the Netherlands
Genentech South San Francisco CA USA
Illumina Inc Foster City CA USA
Immuneering Corporation Cambridge MA USA
IMTM Faculty of Medicine and Dentistry Palacky University Olomouc Olomouc Czech Republic
Institute for Molecular Medicine Finland University of Helsinki Helsinki Finland
Integrative Bioinformatics National Institute of Environmental Health Sciences Durham NC USA
National Center for Toxicological Research US Food and Drug Administration Jefferson AR USA
National Institute of Metrology Beijing China
Q2 Solutions EA Genomics Morrisville NC USA
Sentieon Inc Mountain View CA USA
Seven Bridges Genomics Inc Cambridge MA USA
The Center for Devices and Radiological Health US Food and Drug Administration Silver Spring MD USA
The Center for Drug Evaluation and Research US Food and Drug Administration Silver Spring MD USA
Zobrazit více v PubMed
Glasziou P, Meats E, Heneghan C & Shepperd S What is missing from descriptions of treatment in trials and reviews? Brit. Med. J 336, 1472–1474 (2008). PubMed PMC
Vasilevsky NA et al. On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1, e148 (2013). PubMed PMC
Begley CG & Ellis LM Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012). PubMed
Alioto TS et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun 6, 10001 (2015). PubMed PMC
Griffith M et al. Genome Modeling System: a knowledge management platform for genomics. PLoS Comput. Biol 11, e1004274 (2015). PubMed PMC
Chalmers ZR et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 9, 34 (2017). PubMed PMC
Xu H, DiCarlo J, Satya RV, Peng Q & Wang Y Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15, 244 (2014). PubMed PMC
Ghoneim DH, Myers JR, Tuttle E & Paciorkowski AR Comparison of insertion/deletion calling algorithms on human next-generation sequencing data. BMC Res. Notes 7, 864 (2014). PubMed PMC
Wang Q et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 5, 91 (2013). PubMed PMC
Simen BB et al. Validation of a next-generation-sequencing cancer panel for use in the clinical laboratory. Arch. Pathol. Lab. Med 139, 508–517 (2015). PubMed
Linderman MD et al. Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med. Genomics 7, 20 (2014). PubMed PMC
Zook JM et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol 32, 246–251 (2014). PubMed
Zook JM et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016). PubMed PMC
Lin M-T et al. Clinical validation of KRAS, BRAF, and EGFR mutation detection using next-generation sequencing. Am. J. Clin. Pathol 141, 856–866 (2014). PubMed PMC
Singh RR et al. Clinical validation of a next-generation sequencing screen for mutational hotspots in 46 cancer-related genes. J. Mol. Diagn 15, 607–622 (2013). PubMed
Griffith M et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015). PubMed PMC
Olson ND et al. precisionFDA Truth Challenge V2: calling variants from short- and long-reads in difficult-to-map regions. Preprint at bioRxiv 10.1101/2020.11.13.380741 (2020). PubMed DOI PMC
Morrissy AS et al. Spatial heterogeneity in medulloblastoma. Nat. Genet 49, 780–788 (2017). PubMed PMC
Araf S et al. Genomic profiling reveals spatial intra-tumor heterogeneity in follicular lymphoma. Leukemia 32, 1261–1265 (2018). PubMed PMC
Stephens PJ et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009). PubMed PMC
Kalyana-Sundaram S et al. Gene fusions associated with recurrent amplicons represent a class of passenger aberrations in breast cancer. Neoplasia 14, 702–708 (2012). PubMed PMC
Zhang J et al. INTEGRATE: gene fusion discovery using whole genome and transcriptome data. Genome Res. 26, 108–118 (2016). PubMed PMC
Fang LT et al. Establishing reference data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Preprint at bioRxiv 10.1101/625624 (2019). PubMed DOI PMC
Chen X et al. A multi-center cross-platform single-cell RNA sequencing reference dataset. Sci. Data 8, 39 (2021). PubMed PMC
Chen W et al. A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. Nature Biotechnol. https://www.nature.com/articles/s41587-020-00748-9 (2020). PubMed PMC
Zhao Y et al. Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study. Preprint at bioRxiv 10.1101/2021.02.27.433136 (2021). PubMed DOI PMC
Chen L, Liu P, Evans TC & Ettwiller LM DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017). PubMed
Costello M et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013). PubMed PMC
Do H & Dobrovic A Sequence artifacts in DNA from formalin-fixed tissues: causes and strategies for minimization. Clin. Chem 61, 64–71 (2015). PubMed
Cibulskis K et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol 31, 213–219 (2013). PubMed PMC
Saunders CT et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012). PubMed
Larson DE et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012). PubMed PMC
Langmead B, Trapnell C, Pop M & Salzberg SL Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). PubMed PMC
Li H & Durbin R Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). PubMed PMC
Ivanov M et al. Towards standardization of next-generation sequencing of FFPE samples for clinical oncology: intrinsic obstacles and possible solutions. J. Transl. Med 15, 22 (2017). PubMed PMC
Bolger AM, Lohse M & Usadel B Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). PubMed PMC
Li H BFC: correcting Illumina sequencing errors. Bioinformatics 31, 2885–2887 (2015). PubMed PMC
Freed D, Pan R & Aldana R TNscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering. Preprint at bioRxiv 10.1101/250647 (2018). DOI
Narzisi G et al. Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs. Commun. Biol 1, 20 (2018). PubMed PMC
Gargis AS et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat. Biotechnol 30, 1033–1036 (2012). PubMed PMC
Chen Y-C et al. Comprehensive assessment of somatic copy number variation calling using next-generation sequencing data. Preprint at bioRxiv 10.1101/2021.02.18.431906 (2021). DOI
Sahraeian SME, Fang LT, Mohiyuddin M, Hong H & Xiao W Robust cancer mutation detection with deep learning models derived from tumor-normal sequencing data. Preprint at bioRxiv 10.1101/667261 (2019). DOI
Tian SK et al. Optimizing workflows and processing of cytologic samples for comprehensive analysis by next-generation sequencing: Memorial Sloan Kettering Cancer Center experience. Arch. Pathol. Lab. Med 140, 1200–1205 (2016). PubMed PMC
FastQC (Babraham Bioinformatics, accessed 2 July 2021); https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Wood DE & Salzberg SL Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014). PubMed PMC
Picard (Broad Institute, accessed 2 July 2021); http://broadinstitute.github.io/picard/
Okonechnikov K, Conesa A & García-Alcalde F Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016). PubMed PMC
Ewels P MultiQ. C. Aggregate results from bioinformatics analysis across many samples into a single report. Bioinformatics 32, 3047–3048 (2016). PubMed PMC
Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). PubMed PMC
Li H Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Li H et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). PubMed PMC