Tunnels in enzymes with buried active sites are key structural features allowing the entry of substrates and the release of products, thus contributing to the catalytic efficiency. Targeting the bottlenecks of protein tunnels is also a powerful protein engineering strategy. However, the identification of functional tunnels in multiple protein structures is a non-trivial task that can only be addressed computationally. We present a pipeline integrating automated structural analysis with an in-house machine-learning predictor for the annotation of protein pockets, followed by the calculation of the energetics of ligand transport via biochemically relevant tunnels. A thorough validation using eight distinct molecular systems revealed that CaverDock analysis of ligand un/binding is on par with time-consuming molecular dynamics simulations, but much faster. The optimized and validated pipeline was applied to annotate more than 17,000 cognate enzyme-ligand complexes. Analysis of ligand un/binding energetics indicates that the top priority tunnel has the most favourable energies in 75% of cases. Moreover, energy profiles of cognate ligands revealed that a simple geometry analysis can correctly identify tunnel bottlenecks only in 50% of cases. Our study provides essential information for the interpretation of results from tunnel calculation and energy profiling in mechanistic enzymology and protein engineering. We formulated several simple rules allowing identification of biochemically relevant tunnels based on the binding pockets, tunnel geometry, and ligand transport energy profiles.Scientific contributionsThe pipeline introduced in this work allows for the detailed analysis of a large set of protein-ligand complexes, focusing on transport pathways. We are introducing a novel predictor for determining the relevance of binding pockets for tunnel calculation. For the first time in the field, we present a high-throughput energetic analysis of ligand binding and unbinding, showing that approximate methods for these simulations can identify additional mutagenesis hotspots in enzymes compared to purely geometrical methods. The predictor is included in the supplementary material and can also be accessed at https://github.com/Faranehhad/Large-Scale-Pocket-Tunnel-Annotation.git . The tunnel data calculated in this study has been made publicly available as part of the ChannelsDB 2.0 database, accessible at https://channelsdb2.biodata.ceitec.cz/ .
- Publication type
- Journal Article MeSH
Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics have generated genetic variants at an unprecedented scale. However, efficient tools and resources are needed to link disparate data types-to 'map' variants onto protein structures, to better understand how the variation causes disease, and thereby design therapeutics. Here we present the Genomics 2 Proteins portal ( https://g2p.broadinstitute.org/ ): a human proteome-wide resource that maps 20,076,998 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the Genomics 2 Proteins portal allows users to interactively upload protein residue-wise annotations (for example, variants and scores) as well as the protein structure beyond databases to establish the connection between genomics to proteins. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotypes.
- MeSH
- Databases, Protein * MeSH
- Genetic Variation MeSH
- Genetic Testing methods MeSH
- Genomics * methods MeSH
- Protein Conformation MeSH
- Humans MeSH
- Proteins genetics chemistry MeSH
- Proteome genetics MeSH
- Amino Acid Sequence MeSH
- Software MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
In this study, we employed short- and long-read sequencing technologies to delineate the transcriptional architecture of the human monkeypox virus and to identify key regulatory elements that govern its gene expression. Specifically, we conducted a transcriptomic analysis to annotate the transcription start sites (TSSs) and transcription end sites (TESs) of the virus by utilizing Cap Analysis of gene expression sequencing on the Illumina platform and direct RNA sequencing on the Oxford Nanopore technology device. Our investigations uncovered significant complexity in the use of alternative TSSs and TESs in viral genes. In this research, we also detected the promoter elements and poly(A) signals associated with the viral genes. Additionally, we identified novel genes in both the left and right variable regions of the viral genome.IMPORTANCEGenerally, gaining insight into how the transcription of a virus is regulated offers insights into the key mechanisms that control its life cycle. The recent outbreak of the human monkeypox virus has underscored the necessity of understanding the basic biology of its causative agent. Our results are pivotal for constructing a comprehensive transcriptomic atlas of the human monkeypox virus, providing valuable resources for future studies.
- MeSH
- Genome, Viral MeSH
- Humans MeSH
- Transcription Initiation Site * MeSH
- Promoter Regions, Genetic MeSH
- RNA, Viral genetics MeSH
- Sequence Analysis, RNA * methods MeSH
- Gene Expression Profiling MeSH
- Transcriptome * MeSH
- Monkeypox virus genetics MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Annotation of multiple regions of interest across the whole mouse brain is an indispensable process for quantitative evaluation of a multitude of study endpoints in neuroscience digital pathology. Prior experience and domain expert knowledge are the key aspects for image annotation quality and consistency. At present, image annotation is often achieved manually by certified pathologists or trained technicians, limiting the total throughput of studies performed at neuroscience digital pathology labs. It may also mean that simpler and quicker methods of examining tissue samples are used by non-pathologists, especially in the early stages of research and preclinical studies. To address these limitations and to meet the growing demand for image analysis in a pharmaceutical setting, we developed AnNoBrainer, an open-source software tool that leverages deep learning, image registration, and standard cortical brain templates to automatically annotate individual brain regions on 2D pathology slides. Application of AnNoBrainer to a published set of pathology slides from transgenic mice models of synucleinopathy revealed comparable accuracy, increased reproducibility, and a significant reduction (~ 50%) in time spent on brain annotation, quality control and labelling compared to trained scientists in pathology. Taken together, AnNoBrainer offers a rapid, accurate, and reproducible automated annotation of mouse brain images that largely meets the experts' histopathological assessment standards (> 85% of cases) and enables high-throughput image analysis workflows in digital pathology labs.
Molecular identification of micro- and macroorganisms based on nuclear markers has revolutionized our understanding of their taxonomy, phylogeny and ecology. Today, research on the diversity of eukaryotes in global ecosystems heavily relies on nuclear ribosomal RNA (rRNA) markers. Here, we present the research community-curated reference database EUKARYOME for nuclear ribosomal 18S rRNA, internal transcribed spacer (ITS) and 28S rRNA markers for all eukaryotes, including metazoans (animals), protists, fungi and plants. It is particularly useful for the identification of arbuscular mycorrhizal fungi as it bridges the four commonly used molecular markers-ITS1, ITS2, 18S V4-V5 and 28S D1-D2 subregions. The key benefits of this database over other annotated reference sequence databases are that it is not restricted to certain taxonomic groups and it includes all rRNA markers. EUKARYOME also offers a number of reference long-read sequences that are derived from (meta)genomic and (meta)barcoding-a unique feature that can be used for taxonomic identification and chimera control of third-generation, long-read, high-throughput sequencing data. Taxonomic assignments of rRNA genes in the database are verified based on phylogenetic approaches. The reference datasets are available in multiple formats from the project homepage, http://www.eukaryome.org.
- MeSH
- Databases, Genetic MeSH
- Databases, Nucleic Acid MeSH
- Eukaryota * genetics MeSH
- Phylogeny MeSH
- Genes, rRNA genetics MeSH
- RNA, Ribosomal, 18S genetics MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
Next-generation sequencing methods provide comprehensive data for the analysis of structural and functional analysis of the genome. The draft genomes with low contig number and high N50 value can give insight into the structure of the genome as well as provide information on the annotation of the genome. In this study, we designed a pipeline that can be used to assemble prokaryotic draft genomes with low number of contigs and high N50 value. We aimed to use combination of two de novo assembly tools (SPAdes and IDBA-Hybrid) and evaluate the impact of this approach on the quality metrics of the assemblies. The followed pipeline was tested with the raw sequence data with short reads (< 300) for a total of 10 species from four different genera. To obtain the final draft genomes, we firstly assembled the sequences using SPAdes to find closely related organism using the extracted 16 s rRNA from it. IDBA-Hybrid assembler was used to obtain the second assembly data using the closely related organism genome. SPAdes assembler tool was implemented using the second assembly, produced by IDBA-hybrid as a hint. The results were evaluated using QUAST and BUSCO. The pipeline was successful for the reduction of the contig numbers and increasing the N50 statistical values in the draft genome assemblies while preserving the coverage of the draft genomes.
- MeSH
- Sequence Analysis, DNA methods MeSH
- High-Throughput Nucleotide Sequencing * methods MeSH
- Publication type
- Journal Article MeSH
Germline SAMD9 and SAMD9L mutations (SAMD9/9Lmut) predispose to myelodysplastic syndromes (MDS) with propensity for somatic rescue. In this study, we investigated a clinically annotated pediatric MDS cohort (n = 669) to define the prevalence, genetic landscape, phenotype, therapy outcome and clonal architecture of SAMD9/9L syndromes. In consecutively diagnosed MDS, germline SAMD9/9Lmut accounted for 8% and were mutually exclusive with GATA2 mutations present in 7% of the cohort. Among SAMD9/9Lmut cases, refractory cytopenia was the most prevalent MDS subtype (90%); acquired monosomy 7 was present in 38%; constitutional abnormalities were noted in 57%; and immune dysfunction was present in 28%. The clinical outcome was independent of germline mutations. In total, 67 patients had 58 distinct germline SAMD9/9Lmut clustering to protein middle regions. Despite inconclusive in silico prediction, 94% of SAMD9/9Lmut suppressed HEK293 cell growth, and mutations expressed in CD34+ cells induced overt cell death. Furthermore, we found that 61% of SAMD9/9Lmut patients underwent somatic genetic rescue (SGR) resulting in clonal hematopoiesis, of which 95% was maladaptive (monosomy 7 ± cancer mutations), and 51% had adaptive nature (revertant UPD7q, somatic SAMD9/9Lmut). Finally, bone marrow single-cell DNA sequencing revealed multiple competing SGR events in individual patients. Our findings demonstrate that SGR is common in SAMD9/9Lmut MDS and exemplify the exceptional plasticity of hematopoiesis in children.
- MeSH
- Single-Cell Analysis MeSH
- Bone Marrow Cells metabolism MeSH
- Child MeSH
- HEK293 Cells MeSH
- Intracellular Signaling Peptides and Proteins genetics MeSH
- Kaplan-Meier Estimate MeSH
- Clonal Evolution genetics MeSH
- Clonal Hematopoiesis genetics MeSH
- Infant MeSH
- Humans MeSH
- Adolescent MeSH
- Myelodysplastic Syndromes genetics pathology MeSH
- Tumor Suppressor Proteins genetics MeSH
- Child, Preschool MeSH
- GATA2 Transcription Factor genetics MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Germ-Line Mutation genetics MeSH
- Check Tag
- Child MeSH
- Infant MeSH
- Humans MeSH
- Adolescent MeSH
- Male MeSH
- Child, Preschool MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
BACKGROUND: Prostate cancer is caused by genomic aberrations in normal epithelial cells, however clinical translation of findings from analyses of cancer cells alone has been very limited. A deeper understanding of the tumour microenvironment is needed to identify the key drivers of disease progression and reveal novel therapeutic opportunities. RESULTS: In this study, the experimental enrichment of selected cell-types, the development of a Bayesian inference model for continuous differential transcript abundance, and multiplex immunohistochemistry permitted us to define the transcriptional landscape of the prostate cancer microenvironment along the disease progression axis. An important role of monocytes and macrophages in prostate cancer progression and disease recurrence was uncovered, supported by both transcriptional landscape findings and by differential tissue composition analyses. These findings were corroborated and validated by spatial analyses at the single-cell level using multiplex immunohistochemistry. CONCLUSIONS: This study advances our knowledge concerning the role of monocyte-derived recruitment in primary prostate cancer, and supports their key role in disease progression, patient survival and prostate microenvironment immune modulation.
- MeSH
- Molecular Sequence Annotation MeSH
- Immunophenotyping MeSH
- Immunohistochemistry MeSH
- Kaplan-Meier Estimate MeSH
- Humans MeSH
- Monocytes metabolism pathology MeSH
- Tumor Microenvironment genetics MeSH
- Prostatic Neoplasms diagnosis genetics metabolism mortality MeSH
- Prognosis MeSH
- Disease Progression MeSH
- Gene Expression Profiling * methods MeSH
- Transcriptome * MeSH
- Computational Biology methods MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Check Tag
- Humans MeSH
- Male MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Ectoparasites from the family Diplozoidae (Platyhelminthes, Monogenea) belong to obligate haematophagous helminths of cyprinid fish. Current knowledge of these worms is for the most part limited to their morphological, phylogenetic, and population features. Information concerning the biochemical and molecular nature of physiological processes involved in host-parasite interaction, such as evasion of the immune system and its regulation, digestion of macromolecules, suppression of blood coagulation and inflammation, and effect on host tissue and physiology, is lacking. In this study, we report for the first time a comprehensive transcriptomic/secretome description of expressed genes and proteins secreted by the adult stage of Eudiplozoon nipponicum (Goto, 1891) Khotenovsky, 1985, an obligate sanguivorous monogenean which parasitises the gills of the common carp (Cyprinus carpio). RESULTS: RNA-seq raw reads (324,941 Roche 454 and 149,697,864 Illumina) were generated, de novo assembled, and filtered into 37,062 protein-coding transcripts. For 19,644 (53.0%) of them, we determined their sequential homologues. In silico functional analysis of E. nipponicum RNA-seq data revealed numerous transcripts, pathways, and GO terms responsible for immunomodulation (inhibitors of proteolytic enzymes, CD59-like proteins, fatty acid binding proteins), feeding (proteolytic enzymes cathepsins B, D, L1, and L3), and development (fructose 1,6-bisphosphatase, ferritin, and annexin). LC-MS/MS spectrometry analysis identified 721 proteins secreted by E. nipponicum with predominantly immunomodulatory and anti-inflammatory functions (peptidyl-prolyl cis-trans isomerase, homolog to SmKK7, tetraspanin) and ability to digest host macromolecules (cathepsins B, D, L1). CONCLUSIONS: In this study, we integrated two high-throughput sequencing techniques, mass spectrometry analysis, and comprehensive bioinformatics approach in order to arrive at the first comprehensive description of monogenean transcriptome and secretome. Exploration of E. nipponicum transcriptome-related nucleotide sequences and translated and secreted proteins offer a better understanding of molecular biology and biochemistry of these, often neglected, organisms. It enabled us to report the essential physiological pathways and protein molecules involved in their interactions with the fish hosts.
RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation sequencing reads for characterization of repetitive DNA in eukaryotes. The clustering algorithm facilitates repeat identification in any genome by using relatively small quantities of short sequence reads, and additional tools within the pipeline perform automatic annotation and quantification of the identified repeats. The pipeline is integrated into the Galaxy platform, which provides a user-friendly web interface for script execution and documentation of the results. Compared to the original version of the pipeline, RepeatExplorer2 provides automated annotation of transposable elements, identification of tandem repeats and enhanced visualization of analysis results. Here, we present an overview of the RepeatExplorer2 workflow and provide procedures for its application to (i) de novo repeat identification in a single species, (ii) comparative repeat analysis in a set of species, (iii) development of satellite DNA probes for cytogenetic experiments and (iv) identification of centromeric repeats based on ChIP-seq data. Each procedure takes approximately 2 d to complete. RepeatExplorer2 is available at https://repeatexplorer-elixir.cerit-sc.cz .
- MeSH
- DNA Probes chemistry genetics MeSH
- DNA chemistry genetics MeSH
- Genomics methods MeSH
- Humans MeSH
- Repetitive Sequences, Nucleic Acid MeSH
- Sequence Analysis, DNA methods MeSH
- Cluster Analysis MeSH
- Software MeSH
- DNA Transposable Elements MeSH
- High-Throughput Nucleotide Sequencing methods MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH