Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing

. 2021 Sep ; 39 (9) : 1151-1160. [epub] 20210909

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid34504347

Grantová podpora
P20 GM103466 NIGMS NIH HHS - United States
HHSN261201500003C NCI NIH HHS - United States
P30 GM114737 NIGMS NIH HHS - United States
HHSN261201400008C NCI NIH HHS - United States
HHSN261201500003I NCI NIH HHS - United States
P30 CA071789 NCI NIH HHS - United States
18IPA34170301 American Heart Association (American Heart Association, Inc.)
S10 OD019960 NIH HHS - United States
Z99 CA999999 Intramural NIH HHS - United States

Odkazy

PubMed 34504347
PubMed Central PMC8532138
DOI 10.1038/s41587-021-00993-6
PII: 10.1038/s41587-021-00993-6
Knihovny.cz E-zdroje

The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.

Advanced Biomedical and Computational Sciences Biomedical Informatics and Data Science Directorate Frederick National Laboratory for Cancer Research Frederick MD USA

ATCC Manassas VA USA

Bioinformatics and Computational Biology Core National Heart Lung and Blood Institute National Institutes of Health Bethesda MD USA

Bioinformatics Research and Early Development Roche Sequencing Solutions Inc Belmont CA USA

Biomarker Development Novartis Institutes for Biomedical Research Basel Switzerland

CCR Collaborative Bioinformatics Resource Office of Science and Technology Resources Center for Cancer Research Bethesda MD USA

Center for Biologics Evaluation and Research FDA Silver Spring MD USA

Center for Devices and Radiological Health FDA Silver Spring MD USA

Center for Drug Evaluation and Research FDA Silver Spring MD USA

Center for Genomics Loma Linda University School of Medicine Loma Linda CA USA

Centro di Riferimento Oncologico di Aviano IRCCS National Cancer Institute Unit of Oncogenetics and Functional Oncogenomics Aviano Italy

Computational Genomics and Bioinformatics Branch Center for Biomedical Informatics and Information Technology National Cancer Institute Rockville MD USA

Computational Genomics Genomics Research Center AbbVie North Chicago IL USA

Department of Allergy and Clinical Immunology State Key Laboratory of Respiratory Disease National Clinical Research Center for Respiratory Disease Guangzhou Institute of Respiratory Health 1st Affiliated Hospital of Guangzhou Medical University Guangzhou China

Department of Basic Science Loma Linda University School of Medicine Loma Linda CA USA

Department of Biological Sciences Virginia Tech Blacksburg VA USA

Department of Medical Sciences Molecular Medicine and Science for Life Laboratory Uppsala University Uppsala Sweden

Department of Physiology and Biophysics Weill Cornell Medicine New York NY USA

Division of Cancer Epidemiology and Genetics National Cancer Institute National Institutes of Health Bethesda MD USA

Estonian Genome Centre Institute of Genomics University of Tartu Tartu Estonia

European Infrastructure for Translational Medicine Amsterdam the Netherlands

Genentech a member of the Roche group South San Francisco CA USA

Illumina Inc Foster City CA USA

Immuneering Corporation Boston MA USA

IMTM Faculty of Medicine and Dentistry Palacky University Olomouc Czech Republic

Institute for Molecular Medicine Finland University of Helsinki Helsinki Finland

National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda MD USA

National Center for Toxicological Research FDA Jefferson AR USA

Perron Institute for Neurological and Translational Science Nedlands Western Australia Australia

Sentieon Inc Mountain View CA USA

Sequencing Facility Cancer Research Technology Program Frederick National Laboratory for Cancer Research Frederick MD USA

State Key Laboratory of Genetic Engineering Human Phenome Institute School of Life Sciences and Shanghai Cancer Center Fudan University Shanghai China

Translational Genomics Research Institute Phoenix AZ USA

x Genomics Pleasanton CA USA

Zobrazit více v PubMed

Gall JG Human genome sequencing. Science 233, 1367–1368 (1986). PubMed

Garraway LA & Lander ES Lessons from the cancer genome. Cell 153, 17–37 (2013). PubMed

Bailey MH et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018). PubMed PMC

ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020). PubMed PMC

Hyman DM, Taylor BS & Baselga J Implementing genome-driven oncology. Cell 168, 584–599 (2017). PubMed PMC

Berger MF & Mardis ER The emerging clinical relevance of genomics in cancer medicine. Nat. Rev. Clin. Oncol. 15, 353–365 (2018). PubMed PMC

Hofmann AL et al. Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers. BMC Bioinformatics 18, 8 (2017). PubMed PMC

Krøigård AB, Thomassen M, Lænkholm A-V, Kruse TA & Larsen MJ Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLOS ONE 11, e0151664 (2016). PubMed PMC

Shi W et al. Reliability of whole-exome sequencing for assessing intratumor genetic heterogeneity. Cell Rep. 25, 1446–1457 (2018). PubMed PMC

Kim SY & Speed TP Comparing somatic mutation-callers: beyond Venn diagrams. BMC Bioinformatics 14, 189 (2013). PubMed PMC

Zook JM et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014). PubMed

Zook JM et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019). PubMed PMC

Cibulskis K et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013). PubMed PMC

Xu H, DiCarlo J, Satya RV, Peng Q & Wang Y Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15, 244 (2014). PubMed PMC

Chen Z et al. Systematic comparison of somatic variant calling performance among different sequencing depth and mutation frequency. Sci. Rep. 10, 3501 (2020). PubMed PMC

WHO Reference Panel 1st International Reference Panel for Genomic KRAS Codons 12 and 13 Mutations NIBSC code: 16/250 (National Institute for Biological Standards and Control, 2020).

Huo Z, Tu J, Lee D-F & Zhao R Engineering mutation clones in mammalian cells with CRISPR/Cas9. Methods Mol. Biol. 2108, 355–369 (2020). PubMed

Ewing AD et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015). PubMed PMC

Lee AY et al. Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection. Genome Biol. 19, 188 (2018). PubMed PMC

Alioto TS et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015). PubMed PMC

Craig DW et al. A somatic reference standard for cancer genome sequencing. Sci. Rep. 6, 24607 (2016). PubMed PMC

MDIC SRS Report: Somatic Variant Reference Samples for NGS. (Medical Device Innovation Consortium, 2019).

Stephens PJ et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009). PubMed PMC

Popova T et al. Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res. 72, 5454–5462 (2012). PubMed

Gazdar AF et al. Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78, 766–774 (1998). PubMed

Staaf J et al. Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol. 9, R136 (2008). PubMed PMC

Suzuki T, Tsukumo Y, Furihata C, Naito M & Kohara A Preparation of the standard cell lines for reference mutations in cancer gene-panels by genome editing in HEK 293T/17 cells. Genes Environ. 42, 8 (2020). PubMed PMC

Jia S et al. A novel cell line generated using the CRISPR/Cas9 technology as universal quality control material for KRAS G12V mutation testing. J. Clin. Lab. Anal. 32, e22391 (2018). PubMed PMC

Tian X et al. CRISPR/Cas9—an evolving biological tool kit for cancer biology and oncology. NPJ Precis. Oncol. 3, 8 (2019). PubMed PMC

Blackburn J et al. Use of synthetic DNA spike-in controls (sequins) for human genome sequencing. Nat. Protoc. 14, 2119–2151 (2019). PubMed

Auton A et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). PubMed PMC

Fang LT et al. An ensemble approach to accurately detect somatic mutations using SomaticSeq. Genome Biol. 16, 197 (2015). PubMed PMC

Sahraeian SME et al. Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10, 1041 (2019). PubMed PMC

Li H Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). PubMed PMC

Larson DE et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012). PubMed PMC

Lai Z et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108 (2016). PubMed PMC

Fan Y et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol. 17, 178 (2016). PubMed PMC

Kim S et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018). PubMed

Freed D, Pan R & Aldana R TNscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering. Preprint at bioRxiv 10.1101/250647 (2018). DOI

Sahraeian SME, Fang LT, Mohiyuddin M, Hong H & Xiao W Robust cancer mutation detection with deep learning models derived from tumor–normal sequencing data. Preprint at bioRxiv 10.1101/667261 (2019). DOI

Li H Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014). PubMed PMC

Garrison E & Marth G Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012).

Poplin R et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018). PubMed

Poplin R et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv 10.1101/201178 (2018). DOI

Raine KM et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinformatics 56, 15.9.1–15.9.17 (2016). PubMed PMC

Flensburg C, Sargeant T, Oshlack A & Majewski I SuperFreq: integrated mutation detection and clonal tracking in cancer. PLoS Comput. Biol. 16, e1007603 (2020). PubMed PMC

Deshwar AG et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16, 35 (2015). PubMed PMC

Nik-Zainal S et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012). PubMed PMC

Wang Y et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155–160 (2014). PubMed PMC

Yates LR et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015). PubMed PMC

Gerstung M et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020). PubMed PMC

Greaves M & Maley CC Clonal evolution in cancer. Nature 481, 306–313 (2012). PubMed PMC

McGranahan N & Swanton C Clonal heterogeneity and tumor evolution: past, present, and the future. Cell 168, 613–628 (2017). PubMed

Choo-Wosoba H, Albert PS & Zhu B A hidden Markov modeling approach for identifying tumor subclones in next-generation sequencing studies. Biostatistics 10.1093/biostatistics/kxaa013 (2020). PubMed DOI PMC

Xiao W & The Somatic Mutation Working Group of the SEQC-II Consortium. Towards best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat. Biotechnol. 10.1038/s41587-021-00994-5 (2021). PubMed DOI PMC

Zhao Y et al. Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study. Preprint at bioRxiv 10.1101/2021.02.27.433136 (2021). PubMed DOI PMC

Chen W et al. A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. Nat. Biotechnol. 10.1038/s41587-020-00748-9 (2020). PubMed DOI PMC

Chen X et al. A multi-center cross-platform single-cell RNA sequencing reference dataset. Sci. Data 8, 39 (2021). PubMed PMC

Nik-Zainal S et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016). PubMed PMC

Storchova Z & Kuffer C The consequences of tetraploidy and aneuploidy. J. Cell Sci. 121, 3859–3866 (2008). PubMed

Zook JM et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016). PubMed PMC

Morrissy AS et al. Spatial heterogeneity in medulloblastoma. Nat. Genet. 49, 780–788 (2017). PubMed PMC

Araf S et al. Genomic profiling reveals spatial intra-tumor heterogeneity in follicular lymphoma. Leukemia 32, 1261–1265 (2018). PubMed PMC

Ben-David U et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018). PubMed PMC

Abraham J in Handbook of Transnational Economic Governance Regimes (eds. Tietje C & Brouder A) 1041–1053 (Brill Nijhoff, 2010).

Xiao C et al. Personalized genome assembly for accurate cancer somatic mutation discovery using cancer-normal paired reference samples. Preprint at bioRxiv 10.1101/2021.04.09.438252 (2021). PubMed DOI PMC

Ptashkin RN et al. Prevalence of clonal hematopoiesis mutations in tumor-only clinical genomic profiling of solid tumors. JAMA Oncol. 4, 1589–1593 (2018). PubMed PMC

Meisner LF & Johnson JA Protocols for cytogenetic studies of human embryonic stem cells. Methods 45, 133–141 (2008). PubMed

Bolger AM, Lohse M & Usadel B Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). PubMed PMC

Cingolani P et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012). PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...