sequencing error Dotaz Zobrazit nápovědu
Viroids are non-coding single-stranded circular RNA molecules that replicate autonomously in infected host plants causing mild to lethal symptoms. Their genomes contain about 250-400 nucleotides, depending on viroid species. Members of the family Pospiviroidae, like the Potato spindle tuber viroid (PSTVd), replicate via an asymmetric rolling-circle mechanism using the host DNA-dependent RNA-Polymerase II in the nucleus, while members of Avsunviroidae are replicated in a symmetric rolling-circle mechanism probably by the nuclear-encoded polymerase in chloroplasts. Viroids induce the production of viroid-specific small RNAs (vsRNA) that can direct (post-)transcriptional gene silencing against host transcripts or genomic sequences. Here, we used deep-sequencing to analyze vsRNAs from plants infected with different PSTVd variants to elucidate the PSTVd quasipecies evolved during infection. We recovered several novel as well as previously known PSTVd variants that were obviously competent in replication and identified common strand-specific mutations. The calculated mean error rate per nucleotide position was less than [Formula: see text], quite comparable to the value of [Formula: see text] reported for a member of Avsunviroidae. The resulting error threshold allows the synthesis of longer-than-unit-length replication intermediates as required by the asymmetric rolling-circle mechanism of members of Pospiviroidae.
- Klíčová slova
- Error rate, Pospiviroid, error threshold, quasispecies, sequence network mapping, sequencing error, viroid-specific small RNA,
- MeSH
- genom virový * MeSH
- mutace MeSH
- reassortantní viry genetika MeSH
- replikace viru MeSH
- RNA virová genetika MeSH
- viroidy genetika MeSH
- vysoce účinné nukleotidové sekvenování * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- RNA virová MeSH
DNA conformation may deviate from the classical B-form in ∼13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.
- MeSH
- DNA chemie MeSH
- G-kvadruplexy MeSH
- genomika * metody normy MeSH
- kinetika MeSH
- konformace nukleové kyseliny * MeSH
- lidé MeSH
- mutace MeSH
- nukleotidové motivy MeSH
- replikace DNA MeSH
- reprodukovatelnost výsledků MeSH
- sekvenční analýza DNA * metody MeSH
- vysoce účinné nukleotidové sekvenování * metody normy MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- DNA MeSH
Deep profiling of antibody and T cell-receptor repertoires by means of high-throughput sequencing has become an attractive approach for adaptive immunity studies, but its power is substantially compromised by the accumulation of PCR and sequencing errors. Here we report MIGEC (molecular identifier groups-based error correction), a strategy for high-throughput sequencing data analysis. MIGEC allows for nearly absolute error correction while fully preserving the natural diversity of complex immune repertoires.
- MeSH
- DNA fingerprinting metody normy MeSH
- polymerázová řetězová reakce normy MeSH
- receptory antigenů T-buněk genetika MeSH
- vysoce účinné nukleotidové sekvenování normy MeSH
- výzkumný projekt * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- receptory antigenů T-buněk MeSH
BACKGROUND: Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3' half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology. RESULTS: Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies - and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy. CONCLUSIONS: Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen.
- MeSH
- algoritmy MeSH
- DNA virů genetika MeSH
- interpretace statistických dat * MeSH
- kočičí AIDS genetika imunologie virologie MeSH
- kočky MeSH
- nemoci koček genetika imunologie přenos virologie MeSH
- statistické modely * MeSH
- virus kočičí imunodeficience klasifikace genetika patogenita MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- zvířata MeSH
- Check Tag
- kočky MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Research Support, N.I.H., Extramural MeSH
- Názvy látek
- DNA virů MeSH
The insufficient standardization of diagnostic next-generation sequencing (NGS) still limits its implementation in clinical practice, with the correct detection of mutations at low variant allele frequencies (VAF) facing particular challenges. We address here the standardization of sequencing coverage depth in order to minimize the probability of false positive and false negative results, the latter being underestimated in clinical NGS. There is currently no consensus on the minimum coverage depth, and so each laboratory has to set its own parameters. To assist laboratories with the determination of the minimum coverage parameters, we provide here a user-friendly coverage calculator. Using the sequencing error only, we recommend a minimum depth of coverage of 1,650 together with a threshold of at least 30 mutated reads for a targeted NGS mutation analysis of ≥3% VAF, based on the binomial probability distribution. Moreover, our calculator also allows adding assay-specific errors occurring during DNA processing and library preparation, thus calculating with an overall error of a specific NGS assay. The estimation of correct coverage depth is recommended as a starting point when assessing thresholds of NGS assay. Our study also points to the need for guidance regarding the minimum technical requirements, which based on our experience should include the limit of detection (LOD), overall NGS assay error, input, source and quality of DNA, coverage depth, number of variant supporting reads, and total number of target reads covering variant region. Further studies are needed to define the minimum technical requirements and its reporting in diagnostic NGS.
- Klíčová slova
- TP53 gene, coverage depth calculator, next-generation sequencing, sequencing error, small subclones, variant allele frequency (VAF),
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Clinical laboratories routinely use formalin-fixed paraffin-embedded (FFPE) tissue or cell block cytology samples in oncology panel sequencing to identify mutations that can predict patient response to targeted therapy. To understand the technical error due to FFPE processing, a robustly characterized diploid cell line was used to create FFPE samples with four different pre-tissue processing formalin fixation times. A total of 96 FFPE sections were then distributed to different laboratories for targeted sequencing analysis by four oncopanels, and variants resulting from technical error were identified. RESULTS: Tissue sections that fail more frequently show low cellularity, lower than recommended library preparation DNA input, or target sequencing depth. Importantly, sections from block surfaces are more likely to show FFPE-specific errors, akin to "edge effects" seen in histology, while the inner samples display no quality degradation related to fixation time. CONCLUSIONS: To assure reliable results, we recommend avoiding the block surface portion and restricting mutation detection to genomic regions of high confidence.
- Klíčová slova
- Cancer genomics, FFPE, Next-generation sequencing, Oncopanel sequencing, Preanalytics, Precision medicine,
- MeSH
- fixace tkání MeSH
- formaldehyd * MeSH
- lidé MeSH
- sekvenční analýza DNA MeSH
- vysoce účinné nukleotidové sekvenování * MeSH
- zalévání tkání do parafínu MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
- Názvy látek
- formaldehyd * MeSH
UNLABELLED: Perkinsela is an enigmatic early-branching kinetoplastid protist that lives as an obligate endosymbiont inside Paramoeba (Amoebozoa). We have sequenced the highly reduced mitochondrial genome of Perkinsela, which possesses only six protein-coding genes (cox1, cox2, cox3, cob, atp6, and rps12), despite the fact that the organelle itself contains more DNA than is present in either the host or endosymbiont nuclear genomes. An in silico analysis of two Perkinsela strains showed that mitochondrial RNA editing and processing machineries typical of kinetoplastid flagellates are generally conserved, and all mitochondrial transcripts undergo U-insertion/deletion editing. Canonical kinetoplastid mitochondrial ribosomes are also present. We have developed software tools for accurate and exhaustive mapping of transcriptome sequencing (RNA-seq) reads with extensive U-insertions/deletions, which allows detailed investigation of RNA editing via deep sequencing. With these methods, we show that up to 50% of reads for a given edited region contain errors of the editing system or, less likely, correspond to alternatively edited transcripts. IMPORTANCE: Uridine insertion/deletion-type RNA editing, which occurs in the mitochondrion of kinetoplastid protists, has been well-studied in the model parasite genera Trypanosoma, Leishmania, and Crithidia. Perkinsela provides a unique opportunity to broaden our knowledge of RNA editing machinery from an evolutionary perspective, as it represents the earliest kinetoplastid branch and is an obligatory endosymbiont with extensive reductive trends. Interestingly, up to 50% of mitochondrial transcripts in Perkinsela contain errors. Our study was complemented by use of newly developed software designed for accurate mapping of extensively edited RNA-seq reads obtained by deep sequencing.
- MeSH
- Amoebozoa parazitologie MeSH
- delece genu * MeSH
- editace RNA * MeSH
- Kinetoplastida genetika růst a vývoj MeSH
- mitochondriální DNA chemie genetika MeSH
- mitochondrie genetika MeSH
- sekvenční analýza DNA MeSH
- výpočetní biologie MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- mitochondriální DNA MeSH
The development of the new technologies such as the next-generation sequencing (NGS) makes more accessible the diagnosis of genetically heterogeneous diseases such as Lynch syndrome (LS). LS is one of the most common hereditary form of colorectal cancer. This autosomal dominant inherited disorder is caused by deleterious germline mutations in one of the mismatch repair (MMR) genes - MLH1, MSH2, MSH6 or PMS2, or the deletion in the EPCAM gene. These mutations eventually result in microsatellite instability (MSI), which can be easily tested in tumor tissue. According to the actual recommendations, all patients with CRC that are suspect to have LS, should be offered the MSI testing. When the MSI is positive, these patients should be recommended to genetic counseling. Here we report a pilot study about the application of NGS in the LS diagnosis in patients considered to have sporadic colorectal cancer. The inclusion criteria for the NGS testing were MSI positivity, BRAF V600E and MHL1 methylation negativity. We have used 5 gene amplicon based massive parallel sequencing on MiSeq platform. In one patient, we have identified a new pathogenic mutation in the exon 4 of the MSH6 gene that was previously not described in ClinVar, Human Gene Mutation Database, Ensembl and InSight databases. This mutation was confirmed by the Sanger method. We have shown that the implementation of new criteria for colorectal patients screening are important in clinical praxis and the NGS gene panel testing is suitable for routine laboratory settings.
- Klíčová slova
- Lynch syndrome, MMR genes, microsatellite instability, next generation sequencing, sporadic colorectal cancer,
- MeSH
- dědičné nepolypózní kolorektální nádory diagnóza genetika MeSH
- lidé MeSH
- mikrosatelitní nestabilita MeSH
- oprava chybného párování bází DNA MeSH
- pilotní projekty MeSH
- vysoce účinné nukleotidové sekvenování * MeSH
- zárodečné mutace MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Slovenská republika MeSH
Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.
- MeSH
- DNA genetika MeSH
- lidé MeSH
- nanopóry * MeSH
- nukleotidové motivy MeSH
- sekvenční analýza DNA MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Z-DNA * MeSH
- zastoupení bazí MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Research Support, N.I.H., Extramural MeSH
- Názvy látek
- DNA MeSH
- Z-DNA * MeSH
BACKGROUND: Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, is a highly clonal bacterium showing minimal genetic variability in the genome sequence of individual strains. Nevertheless, genetically characterized syphilis strains can be clearly divided into two groups, Nichols-like strains and SS14-like strains. TPA Nichols and SS14 strains were completely sequenced in 1998 and 2008, respectively. Since publication of their complete genome sequences, a number of sequencing errors in each genome have been reported. Therefore, we have resequenced TPA Nichols and SS14 strains using next-generation sequencing techniques. METHODOLOGY/PRINCIPAL FINDINGS: The genomes of TPA strains Nichols and SS14 were resequenced using the 454 and Illumina sequencing methods that have a combined average coverage higher than 90x. In the TPA strain Nichols genome, 134 errors were identified (25 substitutions and 109 indels), and 102 of them affected protein sequences. In the TPA SS14 genome, a total of 191 errors were identified (85 substitutions and 106 indels) and 136 of them affected protein sequences. A set of new intrastrain heterogenic regions in the TPA SS14 genome were identified including the tprD gene, where both tprD and tprD2 alleles were found. The resequenced genomes of both TPA Nichols and SS14 strains clustered more closely with related strains (i.e. strains belonging to same syphilis treponeme subcluster). At the same time, groups of Nichols-like and SS14-like strains were found to be more distantly related. CONCLUSION/SIGNIFICANCE: We identified errors in 11.5% of all annotated genes and, after correction, we found a significant impact on the predicted proteomes of both Nichols and SS14 strains. Corrections of these errors resulted in protein elongations, truncations, fusions and indels in more than 11% of all annotated proteins. Moreover, it became more evident that syphilis is caused by treponemes belonging to two separate genetic subclusters.
- MeSH
- fylogeneze MeSH
- genetická variace MeSH
- genom genetika MeSH
- molekulární sekvence - údaje MeSH
- sekvence aminokyselin MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA metody MeSH
- sekvenční seřazení MeSH
- syfilis genetika parazitologie MeSH
- Treponema pallidum genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH