Most cited article - PubMed ID 34753956
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.
- MeSH
- Benchmarking * MeSH
- Datasets as Topic MeSH
- Humans MeSH
- Mutation MeSH
- DNA Mutational Analysis standards MeSH
- Cell Line, Tumor MeSH
- Breast Neoplasms genetics MeSH
- Reference Standards MeSH
- Reproducibility of Results MeSH
- Whole Genome Sequencing standards MeSH
- High-Throughput Nucleotide Sequencing standards MeSH
- Germ Cells MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
- MeSH
- Benchmarking * MeSH
- Cell Line MeSH
- Humans MeSH
- Mutation MeSH
- Cell Line, Tumor MeSH
- Neoplasms genetics pathology MeSH
- Reproducibility of Results MeSH
- Sequence Analysis, DNA standards MeSH
- Whole Genome Sequencing standards MeSH
- Exome Sequencing standards MeSH
- High-Throughput Nucleotide Sequencing methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH