Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.
- Keywords
- ELIXIR, Instruct-ERIC, biomolecular structure, nucleic acids structure, protein structure, structural bioinformatics,
- MeSH
- Biological Science Disciplines * MeSH
- Genomics MeSH
- Humans MeSH
- Proteins MeSH
- Computational Biology organization & administration MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Europe MeSH
- Names of Substances
- Proteins MeSH
In this feature article, we provide a side-by-side introduction for two research fields: quantum chemical calculations of molecular interaction in nucleic acids and RNA structural bioinformatics. Our main aim is to demonstrate that these research areas, while largely separated in contemporary literature, have substantial potential to complement each other that could significantly contribute to our understanding of the exciting world of nucleic acids. We identify research questions amenable to the combined application of modern ab initio methods and bioinformatics analysis of experimental structures while also assessing the limitations of these approaches. The ultimate aim is to attain valuable physicochemical insights regarding the nature of the fundamental molecular interactions and how they shape RNA structures, dynamics, function, and evolution.
- MeSH
- Nucleic Acid Conformation MeSH
- Quantum Theory * MeSH
- Nucleic Acids chemistry MeSH
- RNA chemistry MeSH
- Computational Biology * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Names of Substances
- Nucleic Acids MeSH
- RNA MeSH
Nucleic acids are not only static carriers of genetic information but also play vital roles in controlling cellular lifecycles through their fascinating structural diversity [...].
- MeSH
- DNA * chemistry metabolism MeSH
- Nucleic Acid Conformation * MeSH
- Humans MeSH
- RNA * chemistry metabolism MeSH
- Computational Biology * methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Introductory Journal Article MeSH
- Editorial MeSH
- Names of Substances
- DNA * MeSH
- RNA * MeSH
Quadruplexes are noncanonical DNA structures that arise in guanine rich loci and have important biological functions. Classically, quadruplexes contain four stacked intramolecular G-tetrads. Surprisingly, although some algorithms allow searching for longer than 4G tracts for quadruplex formation, these have not yet been systematically studied. Therefore, we analyzed the human genome for sequences that are predicted to adopt stacked intramolecular G-tetrads with greater than four stacks. The data provide evidence for numerous G-quadruplexes that contain five or six stacked intramolecular G-tetrads. These sequences are predominantly found in known gene regulatory regions. Electrophoretic mobility assays and circular dichroism spectroscopy indicate that these sequences form quadruplex structures in vitro under physiological conditions. The localization and in vitro stability of these G-quadruplexes indicate their potentially important roles in gene regulation and their potential for therapeutic applications.
- Keywords
- Bioinformatics, Circular dichroism, Electrophoresis, G-quadruplex,
- MeSH
- Circular Dichroism MeSH
- G-Quadruplexes * MeSH
- Nucleic Acid Conformation MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
The molecular recognition of carbohydrates by proteins plays a key role in many biological processes including immune response, pathogen entry into a cell, and cell-cell adhesion (e.g., in cancer metastasis). Carbohydrates interact with proteins mainly through hydrogen bonding, metal-ion-mediated interaction, and non-polar dispersion interactions. The role of dispersion-driven CH-π interactions (stacking) in protein-carbohydrate recognition has been underestimated for a long time considering the polar interactions to be the main forces for saccharide interactions. However, over the last few years it turns out that non-polar interactions are equally important. In this study, we analyzed the CH-π interactions employing bioinformatics (data mining, structural analysis), several experimental (isothermal titration calorimetry (ITC), X-ray crystallography), and computational techniques. The Protein Data Bank (PDB) has been used as a source of structural data. The PDB contains over 12 000 protein complexes with carbohydrates. Stacking interactions are very frequently present in such complexes (about 39 % of identified structures). The calculations and the ITC measurement results suggest that the CH-π stacking contribution to the overall binding energy ranges from 4 up to 8 kcal mol-1 . All the results show that the stacking CH-π interactions in protein-carbohydrate complexes can be considered to be a driving force of the binding in such complexes.
- Keywords
- carbohydrates, density functional calculations, glycosylation, ligand binding, stacking interaction,
- MeSH
- Proteins chemistry MeSH
- Carbohydrates chemistry MeSH
- In Vitro Techniques MeSH
- Thermodynamics MeSH
- Carbon chemistry MeSH
- Protein Binding MeSH
- Hydrogen chemistry MeSH
- Hydrogen Bonding MeSH
- Computational Biology * MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Proteins MeSH
- Carbohydrates MeSH
- Carbon MeSH
- Hydrogen MeSH
By combining bioinformatics with quantum-chemical calculations, we attempt to address quantitatively some of the physical principles underlying protein folding. The former allowed us to identify tripeptide sequences in existing protein three-dimensional structures with a strong preference for either helical or extended structure. The selected representatives of pro-helical and pro-extended sequences were converted into "isolated" tripeptides-capped at N- and C-termini-and these were subjected to an extensive conformational sampling and geometry optimization (typically thousands to tens of thousands of conformers for each tripeptide). For each conformer, the QM(DFT-D3)/COSMO-RS free-energy value was then calculated, Gconf(solv). The Δ Gconf(solv) is expected to provide an objective, unbiased, and quantitatively accurate measure of the conformational preference of the particular tripeptide sequence. It has been shown that irrespective of the helical vs extended preferences of the selected tripeptide sequences in context of the protein, most of the low-energy conformers of isolated tripeptides prefer the R-helical structure. Nevertheless, pro-helical tripeptides show slightly stronger helix preference than their pro-extended counterparts. Furthermore, when the sampling is repeated in the presence of a partner tripeptide to mimic the situation in a β-sheet, pro-extended tripeptides (exemplified by the VIV) show a larger free-energy benefit than pro-helical tripeptides (exemplified by the EAM). This effect is even more pronounced in a hydrophobic solvent, which mimics the less polar parts of a protein. This is in line with our bioinformatic results showing that the majority of pro-extended tripeptides are hydrophobic. The preference for a specific secondary structure by the studied tripeptides is thus governed by the plasticity to adopt to its environment. In addition, we show that most of the "naturally occurring" conformations of tripeptide sequences, i.e., those found in existing three-dimensional protein structures, are within ∼10 kcal·mol-1 from their global minima. In summary, our "ab initio" data suggest that complex protein structures may start to emerge already at the level of their small oligopeptidic units, which is in line with a hierarchical nature of protein folding.
- MeSH
- Models, Chemical MeSH
- Protein Conformation, alpha-Helical MeSH
- Protein Conformation, beta-Strand MeSH
- Peptides chemistry MeSH
- Protein Folding * MeSH
- Density Functional Theory MeSH
- Thermodynamics MeSH
- Hydrogen Bonding MeSH
- Computational Biology MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Peptides MeSH
The sarcin-ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, that is, in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of the SR motif. A SHAPE probing experiment was also performed to confirm the fidelity of the MD simulations. We identified 57 instances of the SR motif in a nonredundant subset of the RNA X-ray structure database and analyzed their base pairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large rRNA alignments to determine the frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with a highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Nonisosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. The MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that the inability to form stable cWW geometry is an important factor in the case of the first base pair of the flexible region of the SR motif. A comparison of structural, bioinformatics, SHAPE probing, and MD simulation data reveals that explicit solvent MD simulations neatly reflect the viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions.
- MeSH
- Nucleic Acid Conformation MeSH
- Nucleotide Motifs MeSH
- Base Pairing MeSH
- RNA, Ribosomal chemistry metabolism MeSH
- RNA chemistry metabolism MeSH
- Solvents chemistry MeSH
- Molecular Dynamics Simulation MeSH
- Hydrogen Bonding MeSH
- Computational Biology MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
- Names of Substances
- RNA, Ribosomal MeSH
- RNA MeSH
- Solvents MeSH
The L1 stalk is a prominent mobile element of the large ribosomal subunit. We explore the structure and dynamics of its non-canonical rRNA elements, which include two kink-turns, an internal loop, and a tetraloop. We use bioinformatics to identify the L1 stalk RNA conservation patterns and carry out over 11.5 μs of MD simulations for a set of systems ranging from isolated RNA building blocks up to complexes of L1 stalk rRNA with the L1 protein and tRNA fragment. We show that the L1 stalk tetraloop has an unusual GNNA or UNNG conservation pattern deviating from major GNRA and YNMG RNA tetraloop families. We suggest that this deviation is related to a highly conserved tertiary contact within the L1 stalk. The available X-ray structures contain only UCCG tetraloops which in addition differ in orientation (anti vs syn) of the guanine. Our analysis suggests that the anti orientation might be a mis-refinement, although even the anti interaction would be compatible with the sequence pattern and observed tertiary interaction. Alternatively, the anti conformation may be a real substate whose population could be pH-dependent, since the guanine syn orientation requires protonation of cytosine in the tertiary contact. In absence of structural data, we use molecular modeling to explore the GCCA tetraloop that is dominant in bacteria and suggest that the GCCA tetraloop is structurally similar to the YNMG tetraloop. Kink-turn Kt-77 is unusual due to its 11-nucleotide bulge. The simulations indicate that the long bulge is a stalk-specific eight-nucleotide insertion into consensual kink-turn only subtly modifying its structural dynamics. We discuss a possible evolutionary role of helix H78 and a mechanism of L1 stalk interaction with tRNA. We also assess the simulation methodology. The simulations provide a good description of the studied systems with the latest bsc0χOL3 force field showing improved performance. Still, even bsc0χOL3 is unable to fully stabilize an essential sugar-edge H-bond between the bulge and non-canonical stem of the kink-turn. Inclusion of Mg(2+) ions may deteriorate the simulations. On the other hand, monovalent ions can in simulations readily occupy experimental Mg(2+) binding sites.
- MeSH
- Models, Molecular MeSH
- Ribosomal Proteins chemistry MeSH
- RNA, Ribosomal chemistry MeSH
- Molecular Dynamics Simulation * MeSH
- Sulfolobus acidocaldarius chemistry MeSH
- Computational Biology * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- ribosomal protein L1 MeSH Browser
- Ribosomal Proteins MeSH
- RNA, Ribosomal MeSH
BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy-Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used.
- Keywords
- Breakpoints uncertainty problem, Constrained clustering, Mendelian inheritance error, Structural variants, Whole genome sequencing,
- MeSH
- Genome, Human * MeSH
- Genomics MeSH
- Humans MeSH
- Uncertainty MeSH
- Cluster Analysis MeSH
- Genomic Structural Variation * MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Recently, the gene coding for a new beta-glucuronidase enzyme has been identified and cloned from Streptococcus equi subsp. zooepidemicus. This is another report of a beta-glucuronidase gene cloned from bacterial species. The ORF Finder analysis of a sequenced DNA (EMBL, AJ890474) revealed a presence of 1,785 bp large ORF potentially coding for a 594 aa protein. Three protein families in (Pfam) domains were identified using the Conserved Domain Database (CDD) analysis: Pfam 02836, glycosyl hydrolases family 2, triose phosphate isomerase (TIM) barrel domain; Pfam 02837, glycosyl hydrolases family 2, sugar binding domain; and Pfam 00703, glycosyl hydrolases family 2, immunoglobulin-like beta-sandwich domain. To gain more insight into the enzymatic activity, the domains were used to generate a bootstrapped unrooted distance tree using ClustalX. The calculated distances for two domains, TIM barrel domain, and sugar-binding domain were comparable and exhibited similarity pattern based on function and thus being in accordance with recently published works confirming beta-glucuronidase activity of the enzyme. The calculated distances and the tree arrangement in the case of centrally positioned immonoglobulin-like beta-sandwich domain were somewhat higher when compared to other two domains but clustering with other beta-glucuronidases was rather clear. Nine proteins, including beta-glucuronidases, beta-galactosidase, and mannosidase were selected for multiple alignment and subsequent distance tree creation.
- MeSH
- Glucuronidase genetics MeSH
- Horses MeSH
- Models, Molecular MeSH
- Molecular Sequence Data MeSH
- Amino Acid Sequence MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Sequence Homology, Amino Acid MeSH
- Sequence Homology, Nucleic Acid MeSH
- Cluster Analysis MeSH
- Streptococcus equi genetics MeSH
- Protein Structure, Tertiary genetics MeSH
- Computational Biology * MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Glucuronidase MeSH