Most cited article - PubMed ID 34504346
Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
- MeSH
- Benchmarking MeSH
- Genome, Human * MeSH
- Genomics MeSH
- Precision Medicine MeSH
- Humans MeSH
- Cell Line, Tumor MeSH
- Neoplasms genetics MeSH
- Whole Genome Sequencing * MeSH
- Exome Sequencing * MeSH
- Computational Biology MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Dataset MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.
- MeSH
- Benchmarking * MeSH
- Datasets as Topic MeSH
- Humans MeSH
- Mutation MeSH
- DNA Mutational Analysis standards MeSH
- Cell Line, Tumor MeSH
- Breast Neoplasms genetics MeSH
- Reference Standards MeSH
- Reproducibility of Results MeSH
- Whole Genome Sequencing standards MeSH
- High-Throughput Nucleotide Sequencing standards MeSH
- Germ Cells MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH