Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study
Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu dataset, časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem
Grantová podpora
Project No. 2014-2020.4.01.15-0012
EC | European Regional Development Fund (Europski Fond za Regionalni Razvoj)
HHSN261201500003C
NCI NIH HHS - United States
2017-00630, 2019-01976
Vetenskapsrådet (Swedish Research Council)
S10 OD019960
NIH HHS - United States
HHSN261201500003I
NCI NIH HHS - United States
HHSN261201800001C
NCI NIH HHS - United States
PubMed
34753956
PubMed Central
PMC8578599
DOI
10.1038/s41597-021-01077-5
PII: 10.1038/s41597-021-01077-5
Knihovny.cz E-zdroje
- MeSH
- benchmarking MeSH
- genom lidský * MeSH
- genomika MeSH
- individualizovaná medicína MeSH
- lidé MeSH
- nádorové buněčné linie MeSH
- nádory genetika MeSH
- sekvenování celého genomu * MeSH
- sekvenování exomu * MeSH
- výpočetní biologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
AbbVie Genomics Research Center North Chicago IL USA
Bioinformatics Research and Early Development Roche Sequencing Solutions Inc Belmont CA USA
Biomarker Development Novartis Institutes for Biomedical Research Basel Switzerland
Center for Genomics School of Medicine Loma Linda University Loma Linda CA USA
Centre for Molecular Medicine and Innovative Therapeutics Murdoch University Murdoch Australia
Core Applications Group Product Development Illumina Inc Foster City CA USA
Department of Physiology and Biophysics Weill Cornell Medicine New York NY USA
Estonian Genome Centre Institute of Genomics University of Tartu Tartu Estonia
IMTM Faculty of Medicine and Dentistry Palacky University Olomouc Czech Republic
Institute for Molecular Medicine Finland University of Helsinki Helsinki Finland
Member of EATRIS ERIC European Infrastructure for Translational Medicine Amsterdam The Netherlands
National Center for Toxicological Research U S Food and Drug Administration FDA Jefferson AR USA
Perron Institute for Neurological and Translational Science Nedlands Australia
doi: 10.1038/s41587-021-00993-6 PubMed
Přidružená datová sadadoi: 10.1038/s41587-021-00994-5 PubMed
Zobrazit více v PubMed
Morash M, Mitchell H, Beltran H, Elemento O, Pathak J. The Role of Next-Generation Sequencing in Precision Medicine: A Review of Outcomes in Oncology. J Pers Med. 2018;8(3):30. doi: 10.3390/jpm8030030. PubMed DOI PMC
Xiao W, et al. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol. 2021;39:1141–1150. doi: 10.1038/s41587-021-00994-5. PubMed DOI PMC
Fang LT, et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol. 2021;39:1151–1160. doi: 10.1038/s41587-021-00993-6. PubMed DOI PMC
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. PubMed DOI PMC
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv, https://arxiv.org/abs/1303.3997 (2013).
Picard Tools - By Broad Institute. Available at: http://broadinstitute.github.io/picard/. (Accessed: 23rd December 2017)
Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: https://www.bioinformatics.babraham.ac.uk/projects/fastqc
Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32:292–294. PubMed PMC
Ewels P. MultiQC: Aggregate results from bioinformatics analysis across many samples into a single report. Bioinformatics. 2016;32(19):3047–8. doi: 10.1093/bioinformatics/btw354. PubMed DOI PMC
Chen L, Liu P, Evans TC, Ettwiller LM. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science. 2017;355:752–756. doi: 10.1126/science.aai8690. PubMed DOI
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. PubMed DOI PMC
Pedersen B, et al. Indexcov: fast coverage quality control control for whole-genome sequencing. GigaScience. 2017;6:1–6. doi: 10.1093/gigascience/gix090. PubMed DOI PMC
Bishara A, et al. Read clouds uncover variation in complex regions of the human genome. Genome research. 2015;25(10):1570–1580. doi: 10.1101/gr.191189.115. PubMed DOI PMC
Benjamin, D. et al. Calling Somatic SNVs and Indels with Mutect, Preprint at biorxiv, 10.1101/861054 (2019).
Larson DE, et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012;28:311–317. doi: 10.1093/bioinformatics/btr665. PubMed DOI PMC
Saunders CT, et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics. 2012;28:1811–1817. doi: 10.1093/bioinformatics/bts271. PubMed DOI
Narzisi G, et al. Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs. Commun. Biol. 2018;1:20. doi: 10.1038/s42003-018-0023-9. PubMed DOI PMC
Cameron, D. L. et al. GRIDSS, PURPLE, LINX: Unscrambling the tumor genome via integrated analysis of structural variation and copy number, Preprint at bioRxiv10.1101/781013 (2019).
Flensburg C, Sargeant T, Oshlack A, Majewski IJ. SuperFreq: Integrated mutation detection and clonal tracking in cancer. PLOS Computational Biology. 2020;16(2):e1007603. doi: 10.1371/journal.pcbi.1007603. PubMed DOI PMC
2021. NCBI Sequence Read Archive. SRP162370
NCBI ftp site: ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG (2021)
Gnirke A, et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009;27(2):182–189. doi: 10.1038/nbt.1523. PubMed DOI PMC
Costello M, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 2013;41:e67. doi: 10.1093/nar/gks1443. PubMed DOI PMC
Do H, Dobrovic A. Sequence Artifacts in DNA from Formalin-Fixed Tissues: Causes and Strategies for Minimization. Clinical Chemistry. 2015;61(1):64–71. doi: 10.1373/clinchem.2014.223040. PubMed DOI
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. PubMed DOI PMC
Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing