Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study

. 2021 Nov 09 ; 8 (1) : 296. [epub] 20211109

Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu dataset, časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid34753956

Grantová podpora
Project No. 2014-2020.4.01.15-0012 EC | European Regional Development Fund (Europski Fond za Regionalni Razvoj)
HHSN261201500003C NCI NIH HHS - United States
2017-00630, 2019-01976 Vetenskapsrådet (Swedish Research Council)
S10 OD019960 NIH HHS - United States
HHSN261201500003I NCI NIH HHS - United States
HHSN261201800001C NCI NIH HHS - United States

Odkazy

PubMed 34753956
PubMed Central PMC8578599
DOI 10.1038/s41597-021-01077-5
PII: 10.1038/s41597-021-01077-5
Knihovny.cz E-zdroje

With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.

AbbVie Genomics Research Center North Chicago IL USA

Advanced Biomedical and Computational Sciences Biomedical Informatics and Data Science Directorate Frederick National Laboratory for Cancer Research Frederick MD USA

Bioinformatics Research and Early Development Roche Sequencing Solutions Inc Belmont CA USA

Biomarker Development Novartis Institutes for Biomedical Research Basel Switzerland

Center for Genomics School of Medicine Loma Linda University Loma Linda CA USA

Centre for Molecular Medicine and Innovative Therapeutics Murdoch University Murdoch Australia

Centro di Riferimento Oncologico di Aviano IRCCS National Cancer Institute Unit of Oncogenetics and Functional Oncogenomics Aviano Italy

Companion Diagnostics Development Oncology Biomarker Development Genentech South San Francisco CA USA

Computational Genomics and Bioinformatics Branch Center for Biomedical Informatics and Information Technology National Cancer Institute National Institutes of Health Bethesda MD USA

Core Applications Group Product Development Illumina Inc Foster City CA USA

Department of Medical Sciences Molecular Precision Medicine and Science for Life Laboratory Uppsala University Uppsala Sweden

Department of Physiology and Biophysics Weill Cornell Medicine New York NY USA

Division of Cancer Epidemiology and Genetics National Cancer Institute National Institutes of Health Bethesda MD USA

Estonian Genome Centre Institute of Genomics University of Tartu Tartu Estonia

IMTM Faculty of Medicine and Dentistry Palacky University Olomouc Czech Republic

Institute for Molecular Medicine Finland University of Helsinki Helsinki Finland

Member of EATRIS ERIC European Infrastructure for Translational Medicine Amsterdam The Netherlands

National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda MD USA

National Center for Toxicological Research U S Food and Drug Administration FDA Jefferson AR USA

Perron Institute for Neurological and Translational Science Nedlands Australia

Sequencing Facility Cancer Research Technology Program Frederick National Laboratory for Cancer Research Frederick MD USA

State Key Laboratory of Genetic Engineering School of Life Sciences and Shanghai Cancer Center Fudan University Shanghai China

The Center for Drug Evaluation and Research U S Food and Drug Administration FDA Silver Spring MD USA

Přidružená datová sada

doi: 10.1038/s41587-021-00993-6 PubMed

Přidružená datová sada

doi: 10.1038/s41587-021-00994-5 PubMed

Zobrazit více v PubMed

Morash M, Mitchell H, Beltran H, Elemento O, Pathak J. The Role of Next-Generation Sequencing in Precision Medicine: A Review of Outcomes in Oncology. J Pers Med. 2018;8(3):30. doi: 10.3390/jpm8030030. PubMed DOI PMC

Xiao W, et al. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol. 2021;39:1141–1150. doi: 10.1038/s41587-021-00994-5. PubMed DOI PMC

Fang LT, et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol. 2021;39:1151–1160. doi: 10.1038/s41587-021-00993-6. PubMed DOI PMC

Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. PubMed DOI PMC

Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv, https://arxiv.org/abs/1303.3997 (2013).

Picard Tools - By Broad Institute. Available at: http://broadinstitute.github.io/picard/. (Accessed: 23rd December 2017)

Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: https://www.bioinformatics.babraham.ac.uk/projects/fastqc

Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32:292–294. PubMed PMC

Ewels P. MultiQC: Aggregate results from bioinformatics analysis across many samples into a single report. Bioinformatics. 2016;32(19):3047–8. doi: 10.1093/bioinformatics/btw354. PubMed DOI PMC

Chen L, Liu P, Evans TC, Ettwiller LM. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science. 2017;355:752–756. doi: 10.1126/science.aai8690. PubMed DOI

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. PubMed DOI PMC

Pedersen B, et al. Indexcov: fast coverage quality control control for whole-genome sequencing. GigaScience. 2017;6:1–6. doi: 10.1093/gigascience/gix090. PubMed DOI PMC

Bishara A, et al. Read clouds uncover variation in complex regions of the human genome. Genome research. 2015;25(10):1570–1580. doi: 10.1101/gr.191189.115. PubMed DOI PMC

Benjamin, D. et al. Calling Somatic SNVs and Indels with Mutect, Preprint at biorxiv, 10.1101/861054 (2019).

Larson DE, et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012;28:311–317. doi: 10.1093/bioinformatics/btr665. PubMed DOI PMC

Saunders CT, et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics. 2012;28:1811–1817. doi: 10.1093/bioinformatics/bts271. PubMed DOI

Narzisi G, et al. Lancet: genome-wide somatic variant calling using localized colored DeBruijn graphs. Commun. Biol. 2018;1:20. doi: 10.1038/s42003-018-0023-9. PubMed DOI PMC

Cameron, D. L. et al. GRIDSS, PURPLE, LINX: Unscrambling the tumor genome via integrated analysis of structural variation and copy number, Preprint at bioRxiv10.1101/781013 (2019).

Flensburg C, Sargeant T, Oshlack A, Majewski IJ. SuperFreq: Integrated mutation detection and clonal tracking in cancer. PLOS Computational Biology. 2020;16(2):e1007603. doi: 10.1371/journal.pcbi.1007603. PubMed DOI PMC

2021. NCBI Sequence Read Archive. SRP162370

NCBI ftp site: ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG (2021)

Gnirke A, et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009;27(2):182–189. doi: 10.1038/nbt.1523. PubMed DOI PMC

Costello M, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 2013;41:e67. doi: 10.1093/nar/gks1443. PubMed DOI PMC

Do H, Dobrovic A. Sequence Artifacts in DNA from Formalin-Fixed Tissues: Causes and Strategies for Minimization. Clinical Chemistry. 2015;61(1):64–71. doi: 10.1373/clinchem.2014.223040. PubMed DOI

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...