Most cited article - PubMed ID 34504347
Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing
BACKGROUND: Clinical laboratories routinely use formalin-fixed paraffin-embedded (FFPE) tissue or cell block cytology samples in oncology panel sequencing to identify mutations that can predict patient response to targeted therapy. To understand the technical error due to FFPE processing, a robustly characterized diploid cell line was used to create FFPE samples with four different pre-tissue processing formalin fixation times. A total of 96 FFPE sections were then distributed to different laboratories for targeted sequencing analysis by four oncopanels, and variants resulting from technical error were identified. RESULTS: Tissue sections that fail more frequently show low cellularity, lower than recommended library preparation DNA input, or target sequencing depth. Importantly, sections from block surfaces are more likely to show FFPE-specific errors, akin to "edge effects" seen in histology, while the inner samples display no quality degradation related to fixation time. CONCLUSIONS: To assure reliable results, we recommend avoiding the block surface portion and restricting mutation detection to genomic regions of high confidence.
- Keywords
- Cancer genomics, FFPE, Next-generation sequencing, Oncopanel sequencing, Preanalytics, Precision medicine,
- MeSH
- Tissue Fixation MeSH
- Formaldehyde * MeSH
- Humans MeSH
- Sequence Analysis, DNA MeSH
- High-Throughput Nucleotide Sequencing * MeSH
- Paraffin Embedding MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
- Names of Substances
- Formaldehyde * MeSH
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
- MeSH
- Benchmarking MeSH
- Genome, Human * MeSH
- Genomics MeSH
- Precision Medicine MeSH
- Humans MeSH
- Cell Line, Tumor MeSH
- Neoplasms genetics MeSH
- Whole Genome Sequencing * MeSH
- Exome Sequencing * MeSH
- Computational Biology MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Dataset MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
- MeSH
- Benchmarking * MeSH
- Cell Line MeSH
- Humans MeSH
- Mutation MeSH
- Cell Line, Tumor MeSH
- Neoplasms genetics pathology MeSH
- Reproducibility of Results MeSH
- Sequence Analysis, DNA standards MeSH
- Whole Genome Sequencing standards MeSH
- Exome Sequencing standards MeSH
- High-Throughput Nucleotide Sequencing methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH