Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing
Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem
Grantová podpora
P20 GM103466
NIGMS NIH HHS - United States
HHSN261201500003C
NCI NIH HHS - United States
P30 GM114737
NIGMS NIH HHS - United States
HHSN261201400008C
NCI NIH HHS - United States
HHSN261201500003I
NCI NIH HHS - United States
P30 CA071789
NCI NIH HHS - United States
18IPA34170301
American Heart Association (American Heart Association, Inc.)
S10 OD019960
NIH HHS - United States
Z99 CA999999
Intramural NIH HHS - United States
PubMed
34504347
PubMed Central
PMC8532138
DOI
10.1038/s41587-021-00993-6
PII: 10.1038/s41587-021-00993-6
Knihovny.cz E-zdroje
- MeSH
- benchmarking * MeSH
- datové soubory jako téma MeSH
- lidé MeSH
- mutace MeSH
- mutační analýza DNA normy MeSH
- nádorové buněčné linie MeSH
- nádory prsu genetika MeSH
- referenční standardy MeSH
- reprodukovatelnost výsledků MeSH
- sekvenování celého genomu normy MeSH
- vysoce účinné nukleotidové sekvenování normy MeSH
- zárodečné buňky MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.
Bioinformatics Research and Early Development Roche Sequencing Solutions Inc Belmont CA USA
Biomarker Development Novartis Institutes for Biomedical Research Basel Switzerland
Center for Biologics Evaluation and Research FDA Silver Spring MD USA
Center for Devices and Radiological Health FDA Silver Spring MD USA
Center for Drug Evaluation and Research FDA Silver Spring MD USA
Center for Genomics Loma Linda University School of Medicine Loma Linda CA USA
Computational Genomics Genomics Research Center AbbVie North Chicago IL USA
Department of Basic Science Loma Linda University School of Medicine Loma Linda CA USA
Department of Biological Sciences Virginia Tech Blacksburg VA USA
Department of Physiology and Biophysics Weill Cornell Medicine New York NY USA
Estonian Genome Centre Institute of Genomics University of Tartu Tartu Estonia
European Infrastructure for Translational Medicine Amsterdam the Netherlands
Genentech a member of the Roche group South San Francisco CA USA
Illumina Inc Foster City CA USA
Immuneering Corporation Boston MA USA
IMTM Faculty of Medicine and Dentistry Palacky University Olomouc Czech Republic
Institute for Molecular Medicine Finland University of Helsinki Helsinki Finland
National Center for Toxicological Research FDA Jefferson AR USA
Perron Institute for Neurological and Translational Science Nedlands Western Australia Australia
Sentieon Inc Mountain View CA USA
Zobrazit více v PubMed
Gall JG Human genome sequencing. Science 233, 1367–1368 (1986). PubMed
Garraway LA & Lander ES Lessons from the cancer genome. Cell 153, 17–37 (2013). PubMed
Bailey MH et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018). PubMed PMC
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020). PubMed PMC
Hyman DM, Taylor BS & Baselga J Implementing genome-driven oncology. Cell 168, 584–599 (2017). PubMed PMC
Berger MF & Mardis ER The emerging clinical relevance of genomics in cancer medicine. Nat. Rev. Clin. Oncol. 15, 353–365 (2018). PubMed PMC
Hofmann AL et al. Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers. BMC Bioinformatics 18, 8 (2017). PubMed PMC
Krøigård AB, Thomassen M, Lænkholm A-V, Kruse TA & Larsen MJ Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLOS ONE 11, e0151664 (2016). PubMed PMC
Shi W et al. Reliability of whole-exome sequencing for assessing intratumor genetic heterogeneity. Cell Rep. 25, 1446–1457 (2018). PubMed PMC
Kim SY & Speed TP Comparing somatic mutation-callers: beyond Venn diagrams. BMC Bioinformatics 14, 189 (2013). PubMed PMC
Zook JM et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014). PubMed
Zook JM et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019). PubMed PMC
Cibulskis K et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013). PubMed PMC
Xu H, DiCarlo J, Satya RV, Peng Q & Wang Y Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15, 244 (2014). PubMed PMC
Chen Z et al. Systematic comparison of somatic variant calling performance among different sequencing depth and mutation frequency. Sci. Rep. 10, 3501 (2020). PubMed PMC
WHO Reference Panel 1st International Reference Panel for Genomic KRAS Codons 12 and 13 Mutations NIBSC code: 16/250 (National Institute for Biological Standards and Control, 2020).
Huo Z, Tu J, Lee D-F & Zhao R Engineering mutation clones in mammalian cells with CRISPR/Cas9. Methods Mol. Biol. 2108, 355–369 (2020). PubMed
Ewing AD et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015). PubMed PMC
Lee AY et al. Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection. Genome Biol. 19, 188 (2018). PubMed PMC
Alioto TS et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015). PubMed PMC
Craig DW et al. A somatic reference standard for cancer genome sequencing. Sci. Rep. 6, 24607 (2016). PubMed PMC
MDIC SRS Report: Somatic Variant Reference Samples for NGS. (Medical Device Innovation Consortium, 2019).
Stephens PJ et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009). PubMed PMC
Popova T et al. Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res. 72, 5454–5462 (2012). PubMed
Gazdar AF et al. Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78, 766–774 (1998). PubMed
Staaf J et al. Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol. 9, R136 (2008). PubMed PMC
Suzuki T, Tsukumo Y, Furihata C, Naito M & Kohara A Preparation of the standard cell lines for reference mutations in cancer gene-panels by genome editing in HEK 293T/17 cells. Genes Environ. 42, 8 (2020). PubMed PMC
Jia S et al. A novel cell line generated using the CRISPR/Cas9 technology as universal quality control material for KRAS G12V mutation testing. J. Clin. Lab. Anal. 32, e22391 (2018). PubMed PMC
Tian X et al. CRISPR/Cas9—an evolving biological tool kit for cancer biology and oncology. NPJ Precis. Oncol. 3, 8 (2019). PubMed PMC
Blackburn J et al. Use of synthetic DNA spike-in controls (sequins) for human genome sequencing. Nat. Protoc. 14, 2119–2151 (2019). PubMed
Auton A et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). PubMed PMC
Fang LT et al. An ensemble approach to accurately detect somatic mutations using SomaticSeq. Genome Biol. 16, 197 (2015). PubMed PMC
Sahraeian SME et al. Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10, 1041 (2019). PubMed PMC
Li H Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Langmead B & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). PubMed PMC
Larson DE et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012). PubMed PMC
Lai Z et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 44, e108 (2016). PubMed PMC
Fan Y et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol. 17, 178 (2016). PubMed PMC
Kim S et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018). PubMed
Freed D, Pan R & Aldana R TNscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering. Preprint at bioRxiv 10.1101/250647 (2018). DOI
Sahraeian SME, Fang LT, Mohiyuddin M, Hong H & Xiao W Robust cancer mutation detection with deep learning models derived from tumor–normal sequencing data. Preprint at bioRxiv 10.1101/667261 (2019). DOI
Li H Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014). PubMed PMC
Garrison E & Marth G Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012).
Poplin R et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018). PubMed
Poplin R et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv 10.1101/201178 (2018). DOI
Raine KM et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinformatics 56, 15.9.1–15.9.17 (2016). PubMed PMC
Flensburg C, Sargeant T, Oshlack A & Majewski I SuperFreq: integrated mutation detection and clonal tracking in cancer. PLoS Comput. Biol. 16, e1007603 (2020). PubMed PMC
Deshwar AG et al. PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 16, 35 (2015). PubMed PMC
Nik-Zainal S et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012). PubMed PMC
Wang Y et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155–160 (2014). PubMed PMC
Yates LR et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015). PubMed PMC
Gerstung M et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020). PubMed PMC
Greaves M & Maley CC Clonal evolution in cancer. Nature 481, 306–313 (2012). PubMed PMC
McGranahan N & Swanton C Clonal heterogeneity and tumor evolution: past, present, and the future. Cell 168, 613–628 (2017). PubMed
Choo-Wosoba H, Albert PS & Zhu B A hidden Markov modeling approach for identifying tumor subclones in next-generation sequencing studies. Biostatistics 10.1093/biostatistics/kxaa013 (2020). PubMed DOI PMC
Xiao W & The Somatic Mutation Working Group of the SEQC-II Consortium. Towards best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat. Biotechnol. 10.1038/s41587-021-00994-5 (2021). PubMed DOI PMC
Zhao Y et al. Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study. Preprint at bioRxiv 10.1101/2021.02.27.433136 (2021). PubMed DOI PMC
Chen W et al. A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. Nat. Biotechnol. 10.1038/s41587-020-00748-9 (2020). PubMed DOI PMC
Chen X et al. A multi-center cross-platform single-cell RNA sequencing reference dataset. Sci. Data 8, 39 (2021). PubMed PMC
Nik-Zainal S et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016). PubMed PMC
Storchova Z & Kuffer C The consequences of tetraploidy and aneuploidy. J. Cell Sci. 121, 3859–3866 (2008). PubMed
Zook JM et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016). PubMed PMC
Morrissy AS et al. Spatial heterogeneity in medulloblastoma. Nat. Genet. 49, 780–788 (2017). PubMed PMC
Araf S et al. Genomic profiling reveals spatial intra-tumor heterogeneity in follicular lymphoma. Leukemia 32, 1261–1265 (2018). PubMed PMC
Ben-David U et al. Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330 (2018). PubMed PMC
Abraham J in Handbook of Transnational Economic Governance Regimes (eds. Tietje C & Brouder A) 1041–1053 (Brill Nijhoff, 2010).
Xiao C et al. Personalized genome assembly for accurate cancer somatic mutation discovery using cancer-normal paired reference samples. Preprint at bioRxiv 10.1101/2021.04.09.438252 (2021). PubMed DOI PMC
Ptashkin RN et al. Prevalence of clonal hematopoiesis mutations in tumor-only clinical genomic profiling of solid tumors. JAMA Oncol. 4, 1589–1593 (2018). PubMed PMC
Meisner LF & Johnson JA Protocols for cytogenetic studies of human embryonic stem cells. Methods 45, 133–141 (2008). PubMed
Bolger AM, Lohse M & Usadel B Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). PubMed PMC
Cingolani P et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012). PubMed PMC
Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing