BACKGROUND: Whole exome sequencing (WES) and whole genome sequencing (WGS) have become standard methods in human clinical diagnostics as well as in population genomics (POPGEN). Blood-derived genomic DNA (gDNA) is routinely used in the clinical environment. Conversely, many POPGEN studies and commercial tests benefit from easy saliva sampling. Here, we evaluated the quality of variant call sets and the level of genotype concordance of single nucleotide variants (SNVs) and small insertions and deletions (indels) for WES and WGS using paired blood- and saliva-derived gDNA isolates employing genomic reference-based validated protocols. METHODS: The genomic reference standard Coriell NA12878 was repeatedly analyzed using optimized WES and WGS protocols, and data calls were compared with the truth dataset published by the Genome in a Bottle Consortium. gDNA was extracted from the paired blood and saliva samples of 10 participants and processed using the same protocols. A comparison of paired blood-saliva call sets was performed in the context of WGS and WES genomic reference-based technical validation results. RESULTS: The quality pattern of called variants obtained from genomic-reference-based technical replicates correlates with data calls of paired blood-saliva-derived samples in all levels of tested examinations despite a higher rate of non-human contamination found in the saliva samples. The F1 score of 10 blood-to-saliva-derived comparisons ranged between 0.8030-0.9998 for SNVs and between 0.8883-0.9991 for small-indels in the case of the WGS protocol, and between 0.8643-0.999 for SNVs and between 0.7781-1.000 for small-indels in the case of the WES protocol. CONCLUSION: Saliva may be considered an equivalent material to blood for genetic analysis for both WGS and WES under strict protocol conditions. The accuracy of sequencing metrics and variant-detection accuracy is not affected by choosing saliva as the gDNA source instead of blood but much more significantly by the genomic context, variant types, and the sequencing technology used.
- MeSH
- DNA genetika MeSH
- exom MeSH
- genom lidský MeSH
- genomika MeSH
- lidé MeSH
- metagenomika * MeSH
- sekvenování celého genomu MeSH
- sekvenování exomu MeSH
- sliny * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Almost all extant organisms use the same, so-called canonical, genetic code with departures from it being very rare. Even more exceptional are the instances when a eukaryote with non-canonical code can be easily cultivated and has its whole genome and transcriptome sequenced. This is the case of Blastocrithidia nonstop, a trypanosomatid flagellate that reassigned all three stop codons to encode amino acids. RESULTS: We in silico predicted the metabolism of B. nonstop and compared it with that of the well-studied human parasites Trypanosoma brucei and Leishmania major. The mapped mitochondrial, glycosomal and cytosolic metabolism contains all typical features of these diverse and important parasites. We also provided experimental validation for some of the predicted observations, concerning, specifically presence of glycosomes, cellular respiration, and assembly of the respiratory complexes. CONCLUSIONS: In an unusual comparison of metabolism between a parasitic protist with a massively altered genetic code and its close relatives that rely on a canonical code we showed that the dramatic differences on the level of nucleic acids do not seem to be reflected in the metabolisms. Moreover, although the genome of B. nonstop is extremely AT-rich, we could not find any alterations of its pyrimidine synthesis pathway when compared to other trypanosomatids. Hence, we conclude that the dramatic alteration of the genetic code of B. nonstop has no significant repercussions on the metabolism of this flagellate.
- MeSH
- Eukaryota genetika MeSH
- genetický kód MeSH
- paraziti * genetika MeSH
- terminační kodon MeSH
- Trypanosoma brucei brucei * genetika MeSH
- Trypanosomatina * genetika MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
The corpora allata-corpora cardiaca (CA-CC) is an endocrine gland complex that regulates mosquito development and reproduction through the synthesis of juvenile hormone (JH). Epoxidase (Epox) is a key enzyme in the production of JH. We recently utilized CRISPR/Cas9 to establish an epoxidase-deficient (epox-/-) Aedes aegypti line. The CA from epox-/- mutants do not synthesize epoxidated JH III but methyl farneosate (MF), a weak agonist of the JH receptor, and therefore have reduced JH signalling. Illumina sequencing was used to examine the differences in gene expression between the CA-CC from wild type (WT) and epox-/- adult female mosquitoes. From 18,034 identified genes, 317 were significantly differentially expressed. These genes are involved in many biological processes, including the regulation of cell proliferation and apoptosis, energy metabolism, and nutritional uptake. In addition, the same CA-CC samples were also used to examine the microRNA (miRNA) profiles of epox-/- and WT mosquitoes. A total of 197 miRNAs were detected, 24 of which were differentially regulated in epox-/- mutants. miRNA binding sites for these particular miRNAs were identified using an in silico approach; they target a total of 101 differentially expressed genes. Our results suggest that a lack of epoxidase, besides affecting JH synthesis, results in the diminishing of JH signalling that have significant effects on Ae. aegypti CA-CC transcriptome profiles, as well as its miRNA repertoire.
- MeSH
- Aedes * genetika metabolismus MeSH
- corpora allata metabolismus MeSH
- exprese genu MeSH
- juvenilní hormony metabolismus MeSH
- mikro RNA * genetika metabolismus MeSH
- zvířata MeSH
- Check Tag
- ženské pohlaví MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Accessory proteins have diverse roles in coronavirus pathobiology. One of them in SARS-CoV (the causative agent of the severe acute respiratory syndrome outbreak in 2002-2003) is encoded by the open reading frame 8 (ORF8). Among the most dramatic genomic changes observed in SARS-CoV isolated from patients during the peak of the pandemic in 2003 was the acquisition of a characteristic 29-nucleotide deletion in ORF8. This deletion cause splitting of ORF8 into two smaller ORFs, namely ORF8a and ORF8b. Functional consequences of this event are not entirely clear. RESULTS: Here, we performed evolutionary analyses of ORF8a and ORF8b genes and documented that in both cases the frequency of synonymous mutations was greater than that of nonsynonymous ones. These results suggest that ORF8a and ORF8b are under purifying selection, thus proteins translated from these ORFs are likely to be functionally important. Comparisons with several other SARS-CoV genes revealed that another accessory gene, ORF7a, has a similar ratio of nonsynonymous to synonymous mutations suggesting that ORF8a, ORF8b, and ORF7a are under similar selection pressure. CONCLUSIONS: Our results for SARS-CoV echo the known excess of deletions in the ORF7a-ORF7b-ORF8 complex of accessory genes in SARS-CoV-2. A high frequency of deletions in this gene complex might reflect recurrent searches in "functional space" of various accessory protein combinations that may eventually produce more advantageous configurations of accessory proteins similar to the fixed deletion in the SARS-CoV ORF8 gene.
- MeSH
- biologická evoluce MeSH
- COVID-19 * MeSH
- lidé MeSH
- nukleotidy MeSH
- otevřené čtecí rámce MeSH
- SARS-CoV-2 genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.
- MeSH
- alely * MeSH
- Arabidopsis * genetika MeSH
- data mining * metody MeSH
- datové soubory jako téma * MeSH
- frekvence genu MeSH
- genotyp MeSH
- Glycine max * genetika MeSH
- internet * MeSH
- kukuřice setá * genetika MeSH
- metadata MeSH
- mutace MeSH
- pigmentace genetika MeSH
- rostlinné geny genetika MeSH
- software * MeSH
- substituce aminokyselin MeSH
- vegetační klid genetika MeSH
- vizualizace dat MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Bacterial genotyping is a crucial process in outbreak investigation and epidemiological studies. Several typing methods such as pulsed-field gel electrophoresis, multilocus sequence typing (MLST) and whole genome sequencing are currently used in routine clinical practice. However, these methods are costly, time-consuming and have high computational demands. An alternative to these methods is mini-MLST, a quick, cost-effective and robust method based on high-resolution melting analysis. Nevertheless, no standardized approach to identify markers suitable for mini-MLST exists. Here, we present a pipeline for variable fragment detection in unmapped reads based on a modified hybrid assembly approach using data from one sequencing platform. RESULTS: In routine assembly against the reference sequence, high variable reads are not aligned and remain unmapped. If de novo assembly of them is performed, variable genomic regions can be located in created scaffolds. Based on the variability rates calculation, it is possible to find a highly variable region with the same discriminatory power as seven housekeeping gene fragments used in MLST. In the work presented here, we show the capability of identifying one variable fragment in de novo assembled scaffolds of 21 Escherichia coli genomes and three variable regions in scaffolds of 31 Klebsiella pneumoniae genomes. For each identified fragment, the melting temperatures are calculated based on the nearest neighbor method to verify the mini-MLST's discriminatory power. CONCLUSIONS: A pipeline for a modified hybrid assembly approach consisting of reference-based mapping and de novo assembly of unmapped reads is presented. This approach can be employed for the identification of highly variable genomic fragments in unmapped reads. The identified variable regions can then be used in efficient laboratory methods for bacterial typing such as mini-MLST with high discriminatory power, fully replacing expensive methods such as MLST. The results can and will be delivered in a shorter time, which allows immediate and fast infection monitoring in clinical practice.
BACKGROUND: Despite a multifactorial approach being taken for the evaluation of bull semen quality in many animal breeding centres worldwide, reliable prediction of bull fertility is still a challenge. Recently, attention has turned to molecular mechanisms, which could uncover potential biomarkers of fertility. One of these mechanisms is DNA methylation, which together with other epigenetic mechanisms is essential for the fertilising sperm to drive normal embryo development and establish a viable pregnancy. In this study, we hypothesised that bull sperm DNA methylation patterns are related to bull fertility. We therefore investigated DNA methylation patterns from bulls used in artificial insemination with contrasting fertility scores. RESULTS: The DNA methylation patterns were obtained by reduced representative bisulphite sequencing from 10 high-fertility bulls and 10 low-fertility bulls, having average fertility scores of - 6.6 and + 6.5%, respectively (mean of the population was zero). Hierarchical clustering analysis did not distinguish bulls based on fertility but did highlight individual differences. Despite this, using stringent criteria (DNA methylation difference ≥ 35% and a q-value < 0.001), we identified 661 differently methylated cytosines (DMCs). DMCs were preferentially located in intergenic regions, introns, gene downstream regions, repetitive elements, open sea, shores and shelves of CpG islands. We also identified 10 differently methylated regions, covered by 7 unique genes (SFRP1, STXBP4, BCR, PSMG4, ARSG, ATP11A, RXRA), which are involved in spermatogenesis and early embryonic development. CONCLUSION: This study demonstrated that at specific CpG sites, sperm DNA methylation status is related to bull fertility, and identified seven differently methylated genes in sperm of subfertile bulls that may lead to altered gene expression and potentially influence embryo development.
- MeSH
- analýza spermatu * MeSH
- embryonální vývoj genetika MeSH
- fertilita genetika MeSH
- metylace DNA * MeSH
- skot MeSH
- spermie metabolismus MeSH
- těhotenství MeSH
- zvířata MeSH
- Check Tag
- mužské pohlaví MeSH
- skot MeSH
- těhotenství MeSH
- ženské pohlaví MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.
BACKGROUND: Despite increasing interest in γδ T cells and their non-classical behaviour, most studies focus on animals with low numbers of circulating γδ T cells, such as mice and humans. Arguably, γδ T cell functions might be more prominent in chickens where these cells form a higher proportion of the circulatory T cell compartment. The TCR repertoire defines different subsets of γδ T cells, and such analysis is facilitated by well-annotated TCR loci. γδ T cells are considered at the cusp of innate and adaptive immunity but most functions have been identified in γδ low species. A deeper understanding of TCR repertoire biology in γδ high and γδ low animals is critical for defining the evolution of the function of γδ T cells. Repertoire dynamics will reveal populations that can be classified as innate-like or adaptive-like as well as those that straddle this definition. RESULTS: Here, a recent discrepancy in the structure of the chicken TCR gamma locus is resolved, demonstrating that tandem duplication events have shaped the evolution of this locus. Importantly, repertoire sequencing revealed large differences in the usage of individual TRGV genes, a pattern conserved across multiple tissues, including thymus, spleen and the gut. A single TRGV gene, TRGV3.3, with a highly diverse private CDR3 repertoire dominated every tissue in all birds. TRGV usage patterns were partly explained by the TRGV-associated recombination signal sequences. Public CDR3 clonotypes represented varying proportions of the repertoire of TCRs utilising different TRGVs, with one TRGV dominated by super-public clones present in all birds. CONCLUSIONS: The application of repertoire analysis enabled functional annotation of the TCRG locus in a species with a high circulating γδ phenotype. This revealed variable usage of TCRGV genes across multiple tissues, a pattern quite different to that found in γδ low species (human and mouse). Defining the repertoire biology of avian γδ T cells will be key to understanding the evolution and functional diversity of these enigmatic lymphocytes in an animal that is numerically more reliant on them. Practically, this will reveal novel ways in which these cells can be exploited to improve health in medical and veterinary contexts.
- MeSH
- genom * MeSH
- genomika MeSH
- kur domácí * genetika MeSH
- receptory antigenů T-buněk gama-delta * genetika MeSH
- T-lymfocyty MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: The ATP-binding cassette (ABC) transporter superfamily is comprised predominantly of proteins which directly utilize energy from ATP to move molecules across the plasma membrane. Although they have been the subject of frequent investigation across many taxa, arthropod ABCs have been less well studied. While the manual annotation of ABC transporters has been performed in many arthropods, there has so far been no systematic comparison of the superfamily within this order using the increasing number of sequenced genomes. Furthermore, functional work on these genes is limited. RESULTS: Here, we developed a standardized pipeline to annotate ABCs from predicted proteomes and used it to perform comparative genomics on ABC families across arthropod lineages. Using Kruskal-Wallis tests and the Computational Analysis of gene Family Evolution (CAFE), we were able to observe significant expansions of the ABC-B full transporters (P-glycoproteins) in Lepidoptera and the ABC-H transporters in Hemiptera. RNA-sequencing of epithelia tissues in the Lepidoptera Helicoverpa armigera showed that the 7 P-glycoprotein paralogues differ substantially in their tissue distribution, suggesting a spatial division of labor. It also seems that functional redundancy is a feature of these transporters as RNAi knockdown showed that most transporters are dispensable with the exception of the highly conserved gene Snu, which is probably due to its role in cuticular formation. CONCLUSIONS: We have performed an annotation of the ABC superfamily across > 150 arthropod species for which good quality protein annotations exist. Our findings highlight specific expansions of ABC transporter families which suggest evolutionary adaptation. Future work will be able to use this analysis as a resource to provide a better understanding of the ABC superfamily in arthropods.