sequencing error
Dotaz
Zobrazit nápovědu
DNA conformation may deviate from the classical B-form in ∼13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.
- MeSH
- DNA chemie MeSH
- G-kvadruplexy MeSH
- genomika * metody normy MeSH
- kinetika MeSH
- konformace nukleové kyseliny * MeSH
- lidé MeSH
- mutace MeSH
- nukleotidové motivy MeSH
- replikace DNA MeSH
- reprodukovatelnost výsledků MeSH
- sekvenční analýza DNA * metody MeSH
- vysoce účinné nukleotidové sekvenování * metody normy MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
BACKGROUND: Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3' half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology. RESULTS: Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies - and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy. CONCLUSIONS: Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen.
- MeSH
- algoritmy MeSH
- DNA virů genetika MeSH
- interpretace statistických dat * MeSH
- kočičí AIDS genetika imunologie virologie MeSH
- kočky MeSH
- nemoci koček genetika imunologie přenos virologie MeSH
- statistické modely * MeSH
- virus kočičí imunodeficience klasifikace genetika patogenita MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- zvířata MeSH
- Check Tag
- kočky MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Research Support, N.I.H., Extramural MeSH
Viroids are non-coding single-stranded circular RNA molecules that replicate autonomously in infected host plants causing mild to lethal symptoms. Their genomes contain about 250-400 nucleotides, depending on viroid species. Members of the family Pospiviroidae, like the Potato spindle tuber viroid (PSTVd), replicate via an asymmetric rolling-circle mechanism using the host DNA-dependent RNA-Polymerase II in the nucleus, while members of Avsunviroidae are replicated in a symmetric rolling-circle mechanism probably by the nuclear-encoded polymerase in chloroplasts. Viroids induce the production of viroid-specific small RNAs (vsRNA) that can direct (post-)transcriptional gene silencing against host transcripts or genomic sequences. Here, we used deep-sequencing to analyze vsRNAs from plants infected with different PSTVd variants to elucidate the PSTVd quasipecies evolved during infection. We recovered several novel as well as previously known PSTVd variants that were obviously competent in replication and identified common strand-specific mutations. The calculated mean error rate per nucleotide position was less than [Formula: see text], quite comparable to the value of [Formula: see text] reported for a member of Avsunviroidae. The resulting error threshold allows the synthesis of longer-than-unit-length replication intermediates as required by the asymmetric rolling-circle mechanism of members of Pospiviroidae.
Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.
BACKGROUND: Clinical laboratories routinely use formalin-fixed paraffin-embedded (FFPE) tissue or cell block cytology samples in oncology panel sequencing to identify mutations that can predict patient response to targeted therapy. To understand the technical error due to FFPE processing, a robustly characterized diploid cell line was used to create FFPE samples with four different pre-tissue processing formalin fixation times. A total of 96 FFPE sections were then distributed to different laboratories for targeted sequencing analysis by four oncopanels, and variants resulting from technical error were identified. RESULTS: Tissue sections that fail more frequently show low cellularity, lower than recommended library preparation DNA input, or target sequencing depth. Importantly, sections from block surfaces are more likely to show FFPE-specific errors, akin to "edge effects" seen in histology, while the inner samples display no quality degradation related to fixation time. CONCLUSIONS: To assure reliable results, we recommend avoiding the block surface portion and restricting mutation detection to genomic regions of high confidence.
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
- MeSH
- benchmarking * MeSH
- buněčné linie MeSH
- lidé MeSH
- mutace MeSH
- nádorové buněčné linie MeSH
- nádory genetika patologie MeSH
- reprodukovatelnost výsledků MeSH
- sekvenční analýza DNA normy MeSH
- sekvenování celého genomu normy MeSH
- sekvenování exomu normy MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
BACKGROUND: Current technologies in next-generation sequencing are offering high throughput reads at low costs, but still suffer from various sequencing errors. Although pyro- and ion semiconductor sequencing both have the advantage of delivering long and high quality reads, problems might occur when sequencing homopolymer-containing regions, since the repeating identical bases are going to incorporate during the same synthesis cycle, which leads to uncertainty in base calling. The aim of this study was to evaluate the analytical performance of a pyrosequencing-based next-generation sequencing system in detecting homopolymer sequences using homopolymer-preintegrated plasmid constructs and human DNA samples originating from patients with cystic fibrosis. RESULTS: In the plasmid system average correct genotyping was 95.8% in 4-mers, 87.4% in 5-mers and 72.1% in 6-mers. Despite the experienced low genotyping accuracy in 5- and 6-mers, it was possible to generate amplicons with more than a 90% adequate detection rate in every homopolymer tract. When homopolymers in the CFTR gene were sequenced average accuracy was 89.3%, but varied in a wide range (52.2 - 99.1%). In all but one case, an optimal amplicon-sequencing primer combination could be identified. In that single case (7A tract in exon 14 (c.2046_2052)), none of the tested primer sets produced the required analytical performance. CONCLUSIONS: Our results show that pyrosequencing is the most reliable in case of 4-mers and as homopolymer length gradually increases, accuracy deteriorates. With careful primer selection, the NGS system was able to correctly genotype all but one of the homopolymers in the CFTR gene. In conclusion, we configured a plasmid test system that can be used to assess genotyping accuracy of NGS devices and developed an accurate NGS assay for the molecular diagnosis of CF using self-designed primers for amplification and sequencing.
- MeSH
- cystická fibróza genetika MeSH
- lidé MeSH
- plazmidy MeSH
- protein CFTR genetika MeSH
- sekvenční analýza DNA metody MeSH
- tandemové repetitivní sekvence * MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- validační studie MeSH
BACKGROUND: Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, is a highly clonal bacterium showing minimal genetic variability in the genome sequence of individual strains. Nevertheless, genetically characterized syphilis strains can be clearly divided into two groups, Nichols-like strains and SS14-like strains. TPA Nichols and SS14 strains were completely sequenced in 1998 and 2008, respectively. Since publication of their complete genome sequences, a number of sequencing errors in each genome have been reported. Therefore, we have resequenced TPA Nichols and SS14 strains using next-generation sequencing techniques. METHODOLOGY/PRINCIPAL FINDINGS: The genomes of TPA strains Nichols and SS14 were resequenced using the 454 and Illumina sequencing methods that have a combined average coverage higher than 90x. In the TPA strain Nichols genome, 134 errors were identified (25 substitutions and 109 indels), and 102 of them affected protein sequences. In the TPA SS14 genome, a total of 191 errors were identified (85 substitutions and 106 indels) and 136 of them affected protein sequences. A set of new intrastrain heterogenic regions in the TPA SS14 genome were identified including the tprD gene, where both tprD and tprD2 alleles were found. The resequenced genomes of both TPA Nichols and SS14 strains clustered more closely with related strains (i.e. strains belonging to same syphilis treponeme subcluster). At the same time, groups of Nichols-like and SS14-like strains were found to be more distantly related. CONCLUSION/SIGNIFICANCE: We identified errors in 11.5% of all annotated genes and, after correction, we found a significant impact on the predicted proteomes of both Nichols and SS14 strains. Corrections of these errors resulted in protein elongations, truncations, fusions and indels in more than 11% of all annotated proteins. Moreover, it became more evident that syphilis is caused by treponemes belonging to two separate genetic subclusters.
- MeSH
- fylogeneze MeSH
- genetická variace MeSH
- genom genetika MeSH
- molekulární sekvence - údaje MeSH
- sekvence aminokyselin MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA metody MeSH
- sekvenční seřazení MeSH
- syfilis genetika parazitologie MeSH
- Treponema pallidum genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The development of the new technologies such as the next-generation sequencing (NGS) makes more accessible the diagnosis of genetically heterogeneous diseases such as Lynch syndrome (LS). LS is one of the most common hereditary form of colorectal cancer. This autosomal dominant inherited disorder is caused by deleterious germline mutations in one of the mismatch repair (MMR) genes - MLH1, MSH2, MSH6 or PMS2, or the deletion in the EPCAM gene. These mutations eventually result in microsatellite instability (MSI), which can be easily tested in tumor tissue. According to the actual recommendations, all patients with CRC that are suspect to have LS, should be offered the MSI testing. When the MSI is positive, these patients should be recommended to genetic counseling. Here we report a pilot study about the application of NGS in the LS diagnosis in patients considered to have sporadic colorectal cancer. The inclusion criteria for the NGS testing were MSI positivity, BRAF V600E and MHL1 methylation negativity. We have used 5 gene amplicon based massive parallel sequencing on MiSeq platform. In one patient, we have identified a new pathogenic mutation in the exon 4 of the MSH6 gene that was previously not described in ClinVar, Human Gene Mutation Database, Ensembl and InSight databases. This mutation was confirmed by the Sanger method. We have shown that the implementation of new criteria for colorectal patients screening are important in clinical praxis and the NGS gene panel testing is suitable for routine laboratory settings.
- MeSH
- dědičné nepolypózní kolorektální nádory diagnóza genetika MeSH
- lidé MeSH
- mikrosatelitní nestabilita MeSH
- oprava chybného párování bází DNA MeSH
- pilotní projekty MeSH
- vysoce účinné nukleotidové sekvenování * MeSH
- zárodečné mutace MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Slovenská republika MeSH
Dědičná nádorová onemocnění tvoří malou, ale klinicky významnou část onkologických onemocnění, v České republice se jedná ročně o několik tisíc osob. Identifikace kauzální mutace v nádorových predispozičních genech má u těchto nemocných zásadní prognostický a v některých případech i prediktivní význam. Mimo to je podmínkou cílené preventivní péče o asymptomatické nosiče mutací v rodinách se zvýšeným rizikem vzniku nádorového onemocnění. Do současné doby bylo charakterizováno více než 150 nádorových predispozičních genů. Mutace většiny z nich se vyskytují vzácně, s výraznou populační specifičností a jejich klinická interpretace je často obtížná. Diagnostiku raritních variant technicky zjednodušují postupy využívající sekvenování nové generace, které umožňují vyšetření rozsáhlých sad genů. Za účelem racionalizace diagnostiky hereditárních nádorových syndromů v České republice jsme navrhli sekvenační panel „CZECANCA“, který cílí na vyšetření 219 genů asociovaných s dědičnými nádorovými onemocněními. Panel obsahuje přes 50 klinicky významných genů vysokého a středního rizika, zbývající geny tvoří málo prozkoumané a kandidátní predispoziční geny, jejichž vrozené mutace mají nejasnou klinickou interpretaci. Společně s návrhem panelu byl optimalizován postup vlastního sekvenování a bioinformatického zpracování sekvenačních dat pro tvorbu jednotné databáze genotypů analyzovaných vzorků. Cílem projektu je nabídnout použití sekvenačního panelu včetně optimalizovaného postupu sekvenování nové generace diagnostickým laboratořím v České republice a zajistit sdílení genotypů a klinických údajů o vyšetřovaných pacientech ve společné databázi za účelem zlepšení možnosti klinické interpretace vzácných mutací u vysoce rizikových osob.
Individuals with hereditary cancer syndromes form a minor but clinically important subgroup of oncology patients, comprising several thousand cases in the Czech Republic annually. In these patients, the identification of pathogenic mutations in cancer susceptibility genes has an important predictive and, in some cases, prognostic value. It also enables rational preventive strategies in asymptomatic carriers from affected families. More than 150 cancer susceptibility genes have been described so far; however, mutations in most of them are very rare, occurring with substantial population variability, and hence their clinical interpretation is very complicated. Diagnostics of mutations in cancer susceptibility genes have benefited from the broad availability of next-generation sequencing analyses using targeted gene panels. In order to rationalize the diagnostics of hereditary cancer syndromes in the Czech Republic, we have prepared the sequence capture panel “CZECANCA”, targeting 219 cancer susceptibility genes. Besides more than 50 clinically important high- and moderate-penetrance susceptibility genes, the panel also targets less common candidate genes with uncertain clinical relevance. Alongside the panel design, we have optimized the analytical and bioinformatics pipeline, which will facilitate establishing a collective nationwide database of genotypes and clinical data from the analyzed individuals. The key objective of this project is to provide diagnostic laboratories in the Czech Republic with a reliable procedure and collective database improving the clinical utility of next-generation sequencing analyses in high-risk patients, which would help improve the interpretation of rare or population-specific variants in cancer susceptibility genes. Key words: genetic predisposition testing – hereditary cancer syndromes – high-throughput nucleotide sequencing – genetic information databases – panel sequencing – sequence capture – next-generation sequencing (NGS) This work was supported by Czech Ministry of Health grants No. NT14054, NV15-28830A, NV15--27695A and The League Against Cancer Prague. The authors declare they have no potential conflicts of interest concerning drugs, products, or services used in the study. The Editorial Board declares that the manuscript met the ICMJE recommendation for biomedical papers. Submitted: 2. 10. 2015 Accepted: 13. 10. 2015
- Klíčová slova
- sekvenování nové generace (NGS), cílené sekvenování, panelové sekvenování,
- MeSH
- databáze genetické * využití MeSH
- dědičné nádorové syndromy * diagnóza genetika MeSH
- genetická predispozice k nemoci MeSH
- genetické testování metody MeSH
- lidé MeSH
- sekvenční analýza DNA MeSH
- šíření informací MeSH
- výpočetní biologie MeSH
- vysoce účinné nukleotidové sekvenování * MeSH
- výzkumný projekt MeSH
- zárodečné mutace MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- práce podpořená grantem MeSH