parallelized genetic algorithms
Dotaz
Zobrazit nápovědu
BACKGROUND: Identification of coordinately regulated genes according to the level of their expression during the time course of a process allows for discovering functional relationships among genes involved in the process. RESULTS: We present a single class classification method for the identification of genes of similar function from a gene expression time series. It is based on a parallel genetic algorithm which is a supervised computer learning method exploiting prior knowledge of gene function to identify unknown genes of similar function from expression data. The algorithm was tested with a set of randomly generated patterns; the results were compared with seven other classification algorithms including support vector machines. The algorithm avoids several problems associated with unsupervised clustering methods, and it shows better performance then the other algorithms. The algorithm was applied to the identification of secondary metabolite gene clusters of the antibiotic-producing eubacterium Streptomyces coelicolor. The algorithm also identified pathways associated with transport of the secondary metabolites out of the cell. We used the method for the prediction of the functional role of particular ORFs based on the expression data. CONCLUSION: Through analysis of a time series of gene expression, the algorithm identifies pathways which are directly or indirectly associated with genes of interest, and which are active during the time course of the experiment.
BACKGROUND: Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. RESULTS: Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. CONCLUSIONS: The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.
- MeSH
- algoritmy MeSH
- dřevo * MeSH
- genomika metody MeSH
- genotypizační techniky * MeSH
- modely genetické MeSH
- sekvenční analýza * MeSH
- šlechtění rostlin metody MeSH
- smrk genetika růst a vývoj MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Allelic variability in the adaptive immune receptor loci, which harbor the gene segments that encode B cell and T cell receptors (BCR/TCR), is of critical importance for immune responses to pathogens and vaccines. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become widespread in immunology research making it the most readily available source of information about allelic diversity in immunoglobulin (IG) and T cell receptor (TR) loci. Here, we present a novel algorithm for extrasensitive and specific variable (V) and joining (J) gene allele inference, allowing the reconstruction of individual high-quality gene segment libraries. The approach can be applied for inferring allelic variants from peripheral blood lymphocyte BCR and TCR repertoire sequencing data, including hypermutated isotype-switched BCR sequences, thus allowing high-throughput novel allele discovery from a wide variety of existing data sets. The developed algorithm is a part of the MiXCR software. We demonstrate the accuracy of this approach using AIRR-seq paired with long-read genomic sequencing data, comparing it to a widely used algorithm, TIgGER. We applied the algorithm to a large set of IG heavy chain (IGH) AIRR-seq data from 450 donors of ancestrally diverse population groups, and to the largest reported full-length TCR alpha and beta chain (TRA and TRB) AIRR-seq data set, representing 134 individuals. This allowed us to assess the genetic diversity within the IGH, TRA, and TRB loci in different populations and to establish a database of alleles of V and J genes inferred from AIRR-seq data and their population frequencies with free public access through VDJ.online database.
- MeSH
- alely * MeSH
- algoritmy * MeSH
- genetická variace MeSH
- lidé MeSH
- receptory antigenů B-buněk genetika imunologie MeSH
- receptory antigenů T-buněk genetika imunologie MeSH
- sekvenční analýza DNA metody MeSH
- software * MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Variants in the human X-linked cyclin-dependent kinase-like 5 (CDKL5) gene have been reported as being etiologically associated with early infantile epileptic encephalopathy type 2 (EIEE2). We report on two patients, a boy and a girl, with EIEE2 that present with early onset epilepsy, hypotonia, severe intellectual disability, and poor eye contact. METHODS: Massively parallel sequencing (MPS) of a custom-designed gene panel for epilepsy and epileptic encephalopathy containing 112 epilepsy-related genes was performed. Sanger sequencing was used to confirm the novel variants. For confirmation of the functional consequence of an intronic CDKL5 variant in patient 2, an RNA study was done. RESULTS: DNA sequencing revealed de novo variants in CDKL5, a c.2578C>T (p. Gln860*) present in a hemizygous state in a 3-year-old boy, and a potential splice site variant c.463+5G>A in heterozygous state in a 5-year-old girl. Multiple in silico splicing algorithms predicted a highly reduced splice site score for c.463+5G>A. A subsequent mRNA study confirmed an aberrant shorter transcript lacking exon 7. CONCLUSIONS: Our data confirmed that variants in the CDKL5 are associated with EIEE2. There is credible evidence that the novel identified variants are pathogenic and, therefore, are likely the cause of the disease in the presented patients. In one of the patients a stop codon variant is predicted to produce a truncated protein, and in the other patient an intronic variant results in aberrant splicing.
- MeSH
- epilepsie genetika MeSH
- exony MeSH
- genetická variace genetika MeSH
- křeče u dětí genetika MeSH
- lidé MeSH
- mutace MeSH
- předškolní dítě MeSH
- protein-serin-threoninkinasy genetika metabolismus MeSH
- Rettův syndrom genetika MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- předškolní dítě MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- kazuistiky MeSH
- MeSH
- algoritmy MeSH
- databáze genetické MeSH
- lidé MeSH
- melanom chemie genetika mortalita MeSH
- myši MeSH
- receptory antigenů T-buněk chemie genetika MeSH
- receptory antigenů * analýza genetika metabolismus MeSH
- RNA analýza genetika MeSH
- sekvenční analýza RNA metody MeSH
- stanovení celkové genové exprese metody MeSH
- výpočetní biologie metody MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- myši MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Unique molecular identifiers (UMIs) show outstanding performance in targeted high-throughput resequencing, being the most promising approach for the accurate identification of rare variants in complex DNA samples. This approach has application in multiple areas, including cancer diagnostics, thus demanding dedicated software and algorithms. Here we introduce MAGERI, a computational pipeline that efficiently handles all caveats of UMI-based analysis to obtain high-fidelity mutation profiles and call ultra-rare variants. Using an extensive set of benchmark datasets including gold-standard biological samples with known variant frequencies, cell-free DNA from tumor patient blood samples and publicly available UMI-encoded datasets we demonstrate that our method is both robust and efficient in calling rare variants. The versatility of our software is supported by accurate results obtained for both tumor DNA and viral RNA samples in datasets prepared using three different UMI-based protocols.
- MeSH
- databáze genetické MeSH
- lidé MeSH
- nádorové biomarkery krev genetika MeSH
- nádory genetika MeSH
- RNA virová genetika MeSH
- sekvenční analýza DNA metody MeSH
- sekvenční analýza RNA metody MeSH
- software * MeSH
- výpočetní biologie metody MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Pathogenic sequence variants in the IQ motif- and Sec7 domain-containing protein 2 (IQSEC2) gene have been confirmed as causative in the aetiopathogenesis of neurodevelopmental disorders (intellectual disability, autism) and epilepsy. We report on a case of a family with three sons; two of them manifest delayed psychomotor development and epilepsy. Initially proband A was examined using a multistep molecular diagnostics algorithm, including karyotype and array-comparative genomic hybridization analysis, both with negative results. Therefore, probands A and B and their unaffected parents were enrolled for an analysis using targeted "next-generation" sequencing (NGS) with a gene panel ClearSeq Inherited DiseaseXT (Agilent Technologies) and verification analysis by Sanger sequencing. A novel frameshift variant in the X-linked IQSEC2 gene NM_001111125.2:c.1813_1814del, p.(Asp605Profs*3) on protein level, was identified in both affected probands and their asymptomatic mother, having skewed X chromosome inactivation (XCI) (100:0). As the IQSEC2 gene is a known gene escaping from XCI in humans, we expect the existence of mechanisms maintaining the normal or enough level of the IQSEC2 protein in the asymptomatic mother. Further analyses may help to the characterization of the presented novel frameshift variant in the IQSEC2 gene as well as to elucidate the mechanisms leading to the rare asymptomatic phenotypes in females.
- MeSH
- algoritmy MeSH
- delece genu MeSH
- dítě MeSH
- epilepsie komplikace genetika MeSH
- fenotyp MeSH
- genetická variace * MeSH
- inaktivace chromozomu X MeSH
- karyotypizace MeSH
- lidé MeSH
- neurovývojové poruchy komplikace genetika MeSH
- posunová mutace MeSH
- předškolní dítě MeSH
- pruhování chromozomů MeSH
- sekvenční analýza hybridizací s uspořádaným souborem oligonukleotidů MeSH
- srovnávací genomová hybridizace * MeSH
- výměnné faktory guaninnukleotidů genetika MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mužské pohlaví MeSH
- předškolní dítě MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- kazuistiky MeSH
- práce podpořená grantem MeSH
Next-generation sequencing (NGS) is increasingly used in transplantation settings, but also as a method of choice for in-depth analysis of population-specific HLA genetic architecture and its linkage to various diseases. With respect to complex ethnic admixture characteristic for East Croatian population, we aimed to investigate class-I (HLA-A, -B, -C) and class-II (HLA-DRB1, -DQA1, -DQB1) HLA diversity at the highest, 4-field resolution level in 120 healthy, unrelated, blood donor volunteers. Genomic DNA was extracted and HLA genotypes of class I and DQA1 genes were defined in full-length, -DQB1 from intron 1 to 3' UTR, and -DRB1 from intron 1 to intron 4 (Illumina MiSeq platform, Omixon Twin algorithms, IMGT/HLA release 3.30.0_5). Linkage disequilibrium statistics, Hardy-Weinberg departures, and haplotype frequencies were inferred by exact tests and iterative Expectation-Maximization algorithm using PyPop 0.7.0 and Arlequin v3.5.2.2 software. Our data provide first description of 4-field allele and haplotype frequencies in Croatian population, revealing 192 class-I and class-II alleles and extended haplotypic combinations not apparent from the existing 2-field HLA reports from Croatia. This established reference database complements current knowledge of HLA diversity and should prove useful in future population studies, transplantation settings, and disease-associated HLA screening.
- MeSH
- běloši genetika MeSH
- dárci krve MeSH
- dospělí MeSH
- frekvence genu MeSH
- haplotypy MeSH
- HLA-A antigeny genetika MeSH
- HLA-B antigeny genetika MeSH
- HLA-C antigeny genetika MeSH
- HLA-DQ alfa řetězec genetika MeSH
- HLA-DQ beta řetězec genetika MeSH
- HLA-DRB1 řetězec genetika MeSH
- lidé středního věku MeSH
- lidé MeSH
- mladý dospělý MeSH
- sekvenční analýza DNA MeSH
- vazebná nerovnováha MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- zdraví dobrovolníci pro lékařské studie MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Chorvatsko MeSH
Cíl: Molekulární klasifikace endometriálních karcinomů (EK) dělí tyto tumory do čtyř distinktních skupin definovaných genetickým pozadím. Vzhledem k prokázanému klinickému významu se genetické vyšetření EK stává nedílnou součástí dia gnostického postupu. Doporučený dia gnostický algoritmus zahrnuje molekulárně genetický průkaz mutace genu POLE, přičemž všechny další potřebné parametry se vyšetřují pouze imunohistochemicky. Cílem této studie je sdílet naše zkušenosti s molekulární klasifikací EK, která je na našem pracovišti prováděna pomocí imunohistochemie a následně sekvenování nové generace (NGS). Metodika: Do studie byly zařazeny všechny EK dia gnostikované na Šiklově ústavu patologie ve FN Plzeň a v Bioptické laboratoři, s. r. o., od roku 2020 do současnosti. Všechny EK byly prospektivně vyšetřeny nejprve imunohistochemicky (MMR proteiny, p53) a následně molekulárně geneticky pomocí NGS za použití „customizovaného Gyncore panelu“ (zahrnujícího geny POLE, POLD1, MSH2, MSH6, MLH1, PMS2, TP53, PTEN, ARID1A, PIK3CA, PIK3R1, CTNNB1, KRAS, NRAS, BRCA1, BRCA2, BCOR, ERBB2), na jehož základě byly rozčleněny do čtyř molekulárně distinktních skupin [POLE mutované EK (typ 1), hypermutované (MMR deficientní, typ 2), EK bez specifického molekulárního profilu (NSMP, typ 3) a TP53 mutované („copy number high“, typ 4) ]. Výsledky: Soubor zahrnuje celkem 270 molekulárně klasifikovaných EK. Osmnáct případů (6,6 %) bylo klasifikováno jako POLE mutované, 85 případů (31,5 %) jako hypermutované (MMR deficientní), 137 případů (50,7 %) jako EK bez specifického molekulárního profilu, 30 případů (11,1 %) jako TP53 mutované. Dvanáct případů (4,4 %) bylo zařazeno jako „multiple classifier“. Skupina NSMP se často vyznačovala mnohočetnými genetickými alteracemi, přičemž nejčastější byla mutace genu PTEN (44 % v rámci NSMP), následovaly PIK3CA (30 %), ARID1A (21 %) a KRAS (9 %). Závěr: Molekulární klasifikace EK pomocí metody NGS umožňuje v porovnání s doporučeným dia gnostickým algoritmem spolehlivější klasifikaci EK do jednotlivých molekulárních skupin. Kromě toho dovoluje NGS vyšetření odkrýt komplexní genetické pozadí jednotlivých EK, což má význam zvláště v rámci skupiny „bez specifického molekulárního profilu“, kde jsou tato data podkladem pro výzkum léčebných schémat s příslibem cílené terapie tohoto typu nádorů.
Objective: Molecular classification of endometrial carcinomas (EC) divides these neoplasms into four distinct subgroups defined by a molecular background. Given its proven clinical significance, genetic examination is becoming an integral component of the diagnostic procedure. Recommended diagnostic algorithms comprise molecular genetic testing of the POLE gene, whereas the remaining parameters are examined solely by immunohistochemistry. The aim of this study is to share our experiences with the molecular classification of EC, which has been conducted using immunohistochemistry and next-generation sequencing (NGS) at our department. Methods: This study includes all cases of EC diagnosed at Šikl's Department of Pathology and Biopticka Laboratory Ltd. from 2020 to the present. All ECs were prospectively examined by immunohistochemistry (MMR, p53), fol lowed by NGS examination using a customized Gyncore panel (including genes POLE, POLD1, MSH2, MSH6, MLH1, PMS2, TP53, PTEN, ARID1A, PIK3CA, PIK3R1, CTNNB1, KRAS, NRAS, BRCA1, BRCA2, BCOR, ERBB2), based on which the ECs were classified into four molecularly distinct groups [POLE mutated EC (type 1), hypermutated (MMR deficient, type 2), EC with no specific molecular profile (type 3), and TP53 mutated (“copy number high”, type 4)]. Results: The cohort comprised a total of 270 molecularly classified ECs. Eighteen cases (6.6%) were classified as POLE mutated EC, 85 cases (31.5%) as hypermutated EC (MMR deficient), 137 cases (50.7%) as EC of no specific molecular profile, and 30 cases (11.1%) as TP53 mutated EC. Twelve cases (4.4%) were classified as “multiple classifier” endometrial carcinoma. ECs of no specific molecular profile showed multiple genetic alterations, with the most common mutations being PTEN (44% within the group of NSMP), fol lowed by PIK3CA (30%), ARID1A (21%), and KRAS (9%). Conclusion: In comparison with recommended diagnostic algorithms, NGS provides a more reliable classification of EC into particular molecular subgroups. Furthermore, NGS reveals the complex molecular genetic background in individual ECs, which is especially significant within ECs with no specific molecular profile. These data can serve as a springboard for the research of therapeutic programs committed to targeted therapy in this type of tumor.
- MeSH
- imunohistochemie klasifikace metody MeSH
- klasifikace metody MeSH
- lidé MeSH
- molekulární patologie metody MeSH
- mutace genetika MeSH
- nádory endometria * diagnóza genetika klasifikace patologie MeSH
- vysoce účinné nukleotidové sekvenování * klasifikace metody MeSH
- Check Tag
- lidé MeSH
- ženské pohlaví MeSH
- Publikační typ
- klinická studie MeSH
- práce podpořená grantem MeSH
Satellite DNA (satDNA) is one of the major fractions of the eukaryotic nuclear genome. Highly variable satDNA is involved in various genome functions, and a clear link between satellites and phenotypes exists in a wide range of organisms. However, little is known about the origin and temporal dynamics of satDNA. The "library hypothesis" indicates that the rapid evolutionary changes experienced by satDNAs are mostly quantitative. Although this hypothesis has received some confirmation, a number of its aspects are still controversial. A recently developed next-generation sequencing (NGS) method allows the determination of the satDNA landscape and could shed light on unresolved issues. Here, we explore low-coverage NGS data to infer satDNA evolution in the phylogenetic context of the diploid species of the Chenopodium album aggregate. The application of the Illumina read assembly algorithm in combination with Oxford Nanopore sequencing and fluorescent in situ hybridization allowed the estimation of eight satDNA families within the studied group, six of which were newly described. The obtained set of satDNA families of different origins can be divided into several categories, namely group-specific, lineage-specific and species-specific. In the process of evolution, satDNA families can be transmitted vertically and can be eliminated over time. Moreover, transposable element-derived satDNA families may appear repeatedly in the satellitome, creating an illusion of family conservation. Thus, the obtained data refute the "library hypothesis", rather than confirming it, and in our opinion, it is more appropriate to speak about "the library of the mechanisms of origin".
- MeSH
- Chenopodium album genetika růst a vývoj MeSH
- diploidie * MeSH
- DNA rostlinná analýza genetika MeSH
- druhová specificita MeSH
- fylogeneze MeSH
- genom rostlinný * MeSH
- genová knihovna MeSH
- molekulární evoluce * MeSH
- satelitní DNA analýza genetika MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH