Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing

. 2021 Sep ; 39 (9) : 1141-1150. [epub] 20210909

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid34504346

Grantová podpora
HHSN261201500003C NCI NIH HHS - United States
75N910D00024 NIH HHS - United States
S10 OD019960 NIH HHS - United States
HHSN261201400008C NCI NIH HHS - United States
HHSN261201500003I NCI NIH HHS - United States
Z99 CA999999 Intramural NIH HHS - United States
S10OD019960 U.S. Department of Health & Human Services | National Institutes of Health (NIH)

Odkazy

PubMed 34504346
PubMed Central PMC8506910
DOI 10.1038/s41587-021-00994-5
PII: 10.1038/s41587-021-00994-5
Knihovny.cz E-zdroje

Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.

Advanced Biomedical and Computational Sciences Biomedical Informatics and Data Science Directorate Frederick National Laboratory for Cancer Research Frederick MD USA

AstraZeneca Gaithersburg MD USA

ATCC Manassas VA USA

Bioinformatics and Computational Biology Core National Heart Lung and Blood Institute National Institutes of Health Bethesda MD USA

Bioinformatics Research and Early Development Roche Sequencing Solutions Inc Belmont CA USA

Biomarker Development Novartis Institutes for Biomedical Research Basel Switzerland

CCR Collaborative Bioinformatics Resource Office of Science and Technology Resources Center for Cancer Research Bethesda MD USA

Center for Genomics Loma Linda University School of Medicine Loma Linda CA USA

Center for Information Technology National Institutes of Health Bethesda MD USA

Centre for Molecular Medicine and Innovative Therapeutics Murdoch University Murdoch Perth Western Australia Australia

Centro di Riferimento Oncologico di Aviano IRCCS National Cancer Institute Unit of Oncogenetics and Functional Oncogenomics Aviano Italy

Computational Genomics and Bioinformatics Branch Center for Biomedical Informatics and Information Technology National Cancer Institute Rockville MD USA

Computational Genomics Genomics Research Center AbbVie North Chicago IL USA

Department of Biological Sciences Virginia Polytechnic Institute and State University Blacksburg VA USA

Department of Medical Sciences Molecular Medicine and Science for Life Laboratory Uppsala University Uppsala Sweden

Department of Physiology and Biophysics Weill Cornell Medicine New York NY USA

Departments of Medicine and Pathology University of Toledo Medical Center Toledo OH USA

Digicon McLean VA USA

Division of Cancer Epidemiology and Genetics National Cancer Institute National Institutes of Health Rockville MD USA

Estonian Genome Centre Institute of Genomics University of Tartu Tartu Estonia

European Infrastructure for Translational Medicine Amsterdam the Netherlands

Garvan Institute of Medical Research The Kinghorn Cancer Centre Darlinghurst New South Wales Australia

Genentech South San Francisco CA USA

Illumina Inc Foster City CA USA

Immuneering Corporation Cambridge MA USA

IMTM Faculty of Medicine and Dentistry Palacky University Olomouc Olomouc Czech Republic

Institute for Molecular Medicine Finland University of Helsinki Helsinki Finland

Integrative Bioinformatics National Institute of Environmental Health Sciences Durham NC USA

Lymphoid Malignancies Branch Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda MD USA

National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda MD USA

National Center for Toxicological Research US Food and Drug Administration Jefferson AR USA

National Institute of Metrology Beijing China

Office of the Chief Scientist Office of the Commissioner US Food and Drug Information Silver Spring MD USA

Perron Institute for Neurological and Translational Science Nedlands Perth Western Australia Australia

Q2 Solutions EA Genomics Morrisville NC USA

SAS Institute Inc Cary NC USA

Sentieon Inc Mountain View CA USA

Sequencing Facility Cancer Research Technology Program Frederick National Laboratory for Cancer Research Frederick MD USA

Seven Bridges Genomics Inc Cambridge MA USA

State Key Laboratory of Genetic Engineering Human Phenome Institute School of Life Sciences and Shanghai Cancer Center Fudan University Shanghai China

The Center for Biologics Evaluation and Research US Food and Drug Administration Silver Spring MD USA

The Center for Devices and Radiological Health US Food and Drug Administration Silver Spring MD USA

The Center for Drug Evaluation and Research US Food and Drug Administration Silver Spring MD USA

Zobrazit více v PubMed

Glasziou P, Meats E, Heneghan C & Shepperd S What is missing from descriptions of treatment in trials and reviews? Brit. Med. J 336, 1472–1474 (2008). PubMed PMC

Vasilevsky NA et al. On the reproducibility of science: unique identification of research resources in the biomedical literature. PeerJ 1, e148 (2013). PubMed PMC

Begley CG & Ellis LM Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012). PubMed

Alioto TS et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun 6, 10001 (2015). PubMed PMC

Griffith M et al. Genome Modeling System: a knowledge management platform for genomics. PLoS Comput. Biol 11, e1004274 (2015). PubMed PMC

Chalmers ZR et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 9, 34 (2017). PubMed PMC

Xu H, DiCarlo J, Satya RV, Peng Q & Wang Y Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15, 244 (2014). PubMed PMC

Ghoneim DH, Myers JR, Tuttle E & Paciorkowski AR Comparison of insertion/deletion calling algorithms on human next-generation sequencing data. BMC Res. Notes 7, 864 (2014). PubMed PMC

Wang Q et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Med. 5, 91 (2013). PubMed PMC

Simen BB et al. Validation of a next-generation-sequencing cancer panel for use in the clinical laboratory. Arch. Pathol. Lab. Med 139, 508–517 (2015). PubMed

Linderman MD et al. Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med. Genomics 7, 20 (2014). PubMed PMC

Zook JM et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol 32, 246–251 (2014). PubMed

Zook JM et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016). PubMed PMC

Lin M-T et al. Clinical validation of KRAS, BRAF, and EGFR mutation detection using next-generation sequencing. Am. J. Clin. Pathol 141, 856–866 (2014). PubMed PMC

Singh RR et al. Clinical validation of a next-generation sequencing screen for mutational hotspots in 46 cancer-related genes. J. Mol. Diagn 15, 607–622 (2013). PubMed

Griffith M et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015). PubMed PMC

Olson ND et al. precisionFDA Truth Challenge V2: calling variants from short- and long-reads in difficult-to-map regions. Preprint at bioRxiv 10.1101/2020.11.13.380741 (2020). PubMed DOI PMC

Morrissy AS et al. Spatial heterogeneity in medulloblastoma. Nat. Genet 49, 780–788 (2017). PubMed PMC

Araf S et al. Genomic profiling reveals spatial intra-tumor heterogeneity in follicular lymphoma. Leukemia 32, 1261–1265 (2018). PubMed PMC

Stephens PJ et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009). PubMed PMC

Kalyana-Sundaram S et al. Gene fusions associated with recurrent amplicons represent a class of passenger aberrations in breast cancer. Neoplasia 14, 702–708 (2012). PubMed PMC

Zhang J et al. INTEGRATE: gene fusion discovery using whole genome and transcriptome data. Genome Res. 26, 108–118 (2016). PubMed PMC

Fang LT et al. Establishing reference data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Preprint at bioRxiv 10.1101/625624 (2019). PubMed DOI PMC

Chen X et al. A multi-center cross-platform single-cell RNA sequencing reference dataset. Sci. Data 8, 39 (2021). PubMed PMC

Chen W et al. A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. Nature Biotechnol. https://www.nature.com/articles/s41587-020-00748-9 (2020). PubMed PMC

Zhao Y et al. Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study. Preprint at bioRxiv 10.1101/2021.02.27.433136 (2021). PubMed DOI PMC

Chen L, Liu P, Evans TC & Ettwiller LM DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017). PubMed

Costello M et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013). PubMed PMC

Do H & Dobrovic A Sequence artifacts in DNA from formalin-fixed tissues: causes and strategies for minimization. Clin. Chem 61, 64–71 (2015). PubMed

Cibulskis K et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol 31, 213–219 (2013). PubMed PMC

Saunders CT et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012). PubMed

Larson DE et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012). PubMed PMC

Langmead B, Trapnell C, Pop M & Salzberg SL Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). PubMed PMC

Li H & Durbin R Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009). PubMed PMC

Ivanov M et al. Towards standardization of next-generation sequencing of FFPE samples for clinical oncology: intrinsic obstacles and possible solutions. J. Transl. Med 15, 22 (2017). PubMed PMC

Bolger AM, Lohse M & Usadel B Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). PubMed PMC

Li H BFC: correcting Illumina sequencing errors. Bioinformatics 31, 2885–2887 (2015). PubMed PMC

Freed D, Pan R & Aldana R TNscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering. Preprint at bioRxiv 10.1101/250647 (2018). DOI

Narzisi G et al. Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs. Commun. Biol 1, 20 (2018). PubMed PMC

Gargis AS et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat. Biotechnol 30, 1033–1036 (2012). PubMed PMC

Chen Y-C et al. Comprehensive assessment of somatic copy number variation calling using next-generation sequencing data. Preprint at bioRxiv 10.1101/2021.02.18.431906 (2021). DOI

Sahraeian SME, Fang LT, Mohiyuddin M, Hong H & Xiao W Robust cancer mutation detection with deep learning models derived from tumor-normal sequencing data. Preprint at bioRxiv 10.1101/667261 (2019). DOI

Tian SK et al. Optimizing workflows and processing of cytologic samples for comprehensive analysis by next-generation sequencing: Memorial Sloan Kettering Cancer Center experience. Arch. Pathol. Lab. Med 140, 1200–1205 (2016). PubMed PMC

FastQC (Babraham Bioinformatics, accessed 2 July 2021); https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Wood DE & Salzberg SL Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014). PubMed PMC

Picard (Broad Institute, accessed 2 July 2021); http://broadinstitute.github.io/picard/

Okonechnikov K, Conesa A & García-Alcalde F Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294 (2016). PubMed PMC

Ewels P MultiQ. C. Aggregate results from bioinformatics analysis across many samples into a single report. Bioinformatics 32, 3047–3048 (2016). PubMed PMC

Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). PubMed PMC

Li H Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

Li H et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...