Most cited article - PubMed ID 18477633
DNA conformations and their sequence preferences
The transition from B-DNA to A-DNA occurs in many protein-DNA interactions or in DNA/RNA hybrid duplexes, and thus plays a role in many important biomolecular processes that convey the biological function of DNA. However, the stability of A-DNA is severely underestimated in current AMBER force fields such as OL15, OL21 or bsc1, potentially leading to unstable or deformed protein-DNA complexes. In this study, we refine the deoxyribose dihedral potential to increase the stability of the north (N) puckering present in A-DNA. The new parameters, termed OL24, model A/B equilibrium in B-DNA duplexes in water in good agreement with nuclear magnetic resonance (NMR) experiment. They also improve the description of DNA/RNA hybrids and the transition of the DNA duplex to the A-form in concentrated ethanol solutions. These refinements significantly improve the modeling of protein-DNA complexes, increasing their structural stability and A-form population, while maintaining accurate representation of canonical B-DNA duplexes. Overall, the new parameters should allow more reliable modeling of the thermodynamic equilibrium between A- and B-DNA forms and the interactions of DNA with proteins.
- MeSH
- DNA, A-Form * chemistry MeSH
- DNA, B-Form * chemistry MeSH
- Deoxyribose * chemistry MeSH
- DNA * chemistry MeSH
- Nucleic Acid Conformation MeSH
- Molecular Dynamics Simulation MeSH
- Thermodynamics MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA, A-Form * MeSH
- DNA, B-Form * MeSH
- Deoxyribose * MeSH
- DNA * MeSH
The protein structure prediction problem has been solved for many types of proteins by AlphaFold. Recently, there has been considerable excitement to build off the success of AlphaFold and predict the 3D structures of RNAs. RNA prediction methods use a variety of techniques, from physics-based to machine learning approaches. We believe that there are challenges preventing the successful development of deep learning-based methods like AlphaFold for RNA in the short term. Broadly speaking, the challenges are the limited number of structures and alignments making data-hungry deep learning methods unlikely to succeed. Additionally, there are several issues with the existing structure and sequence data, as they are often of insufficient quality, highly biased and missing key information. Here, we discuss these challenges in detail and suggest some steps to remedy the situation. We believe that it is possible to create an accurate RNA structure prediction method, but it will require solving several data quality and volume issues, usage of data beyond simple sequence alignments, or the development of new less data-hungry machine learning methods.
- MeSH
- Deep Learning MeSH
- Nucleic Acid Conformation MeSH
- Models, Molecular MeSH
- RNA * chemistry metabolism genetics MeSH
- RNA Folding MeSH
- Sequence Alignment MeSH
- Software MeSH
- Machine Learning MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- RNA * MeSH
A detailed description of the dnatco.datmos.org web server implementing the universal structural alphabet of nucleic acids is presented. It is capable of processing any mmCIF- or PDB-formatted files containing DNA or RNA molecules; these can either be uploaded by the user or supplied as the wwPDB or PDB-REDO structural database access code. The web server performs an assignment of the nucleic acid conformations and presents the results for the intuitive annotation, validation, modeling and refinement of nucleic acids.
- Keywords
- annotation, nucleic acids, refinement, structural alphabets, validation,
- MeSH
- Databases, Nucleic Acid MeSH
- DNA chemistry MeSH
- Internet MeSH
- Nucleic Acid Conformation MeSH
- Models, Molecular MeSH
- RNA chemistry MeSH
- Software * MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA MeSH
- RNA MeSH
By analyzing almost 120 000 dinucleotides in over 2000 nonredundant nucleic acid crystal structures, we define 96+1 diNucleotide Conformers, NtCs, which describe the geometry of RNA and DNA dinucleotides. NtC classes are grouped into 15 codes of the structural alphabet CANA (Conformational Alphabet of Nucleic Acids) to simplify symbolic annotation of the prominent structural features of NAs and their intuitive graphical display. The search for nontrivial patterns of NtCs resulted in the identification of several types of RNA loops, some of them observed for the first time. Over 30% of the nearly six million dinucleotides in the PDB cannot be assigned to any NtC class but we demonstrate that up to a half of them can be re-refined with the help of proper refinement targets. A statistical analysis of the preferences of NtCs and CANA codes for the 16 dinucleotide sequences showed that neither the NtC class AA00, which forms the scaffold of RNA structures, nor BB00, the DNA most populated class, are sequence neutral but their distributions are significantly biased. The reported automated assignment of the NtC classes and CANA codes available at dnatco.org provides a powerful tool for unbiased analysis of nucleic acid structures by structural and molecular biologists.
- MeSH
- Biocatalysis MeSH
- DNA chemistry classification MeSH
- Nucleic Acid Conformation * MeSH
- Nucleotide Motifs * MeSH
- Nucleotides chemistry classification MeSH
- Reproducibility of Results MeSH
- Riboswitch MeSH
- Ribosomes chemistry metabolism MeSH
- RNA, Catalytic chemistry metabolism MeSH
- RNA chemistry classification MeSH
- Binding Sites MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA MeSH
- Nucleotides MeSH
- Riboswitch MeSH
- RNA, Catalytic MeSH
- RNA MeSH
In this article, we present a method for the enhanced molecular dynamics simulation of protein and DNA systems called potential of mean force (PMF)-enriched sampling. The method uses partitions derived from the potentials of mean force, which we determined from DNA and protein structures in the Protein Data Bank (PDB). We define a partition function from a set of PDB-derived PMFs, which efficiently compensates for the error introduced by the assumption of a homogeneous partition function from the PDB datasets. The bias based on the PDB-derived partitions is added in the form of a hybrid Hamiltonian using a renormalization method, which adds the PMF-enriched gradient to the system depending on a linear weighting factor and the underlying force field. We validated the method using simulations of dialanine, the folding of TrpCage, and the conformational sampling of the Dickerson⁻Drew DNA dodecamer. Our results show the potential for the PMF-enriched simulation technique to enrich the conformational space of biomolecules along their order parameters, while we also observe a considerable speed increase in the sampling by factors ranging from 13.1 to 82. The novel method can effectively be combined with enhanced sampling or coarse-graining methods to enrich conformational sampling with a partition derived from the PDB.
- Keywords
- DNA simulation, enhanced molecular dynamics simulations, protein folding,
- MeSH
- Databases, Protein * MeSH
- DNA * chemistry genetics MeSH
- Computer Simulation * MeSH
- Protein Folding * MeSH
- Molecular Dynamics Simulation * MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA * MeSH
DNA is a structurally plastic molecule, and its biological function is enabled by adaptation to its binding partners. To identify the DNA structural polymorphisms that are possible in such adaptations, the dinucleotide structures of 60 000 DNA steps from sequentially nonredundant crystal structures were classified and an automated protocol assigning 44 distinct structural (conformational) classes called NtC (for Nucleotide Conformers) was developed. To further facilitate understanding of the DNA structure, the NtC were assembled into the DNA structural alphabet CANA (Conformational Alphabet of Nucleic Acids) and the projection of CANA onto the graphical representation of the molecular structure was proposed. The NtC classification was used to define a validation score called confal, which quantifies the conformity between an analyzed structure and the geometries of NtC. NtC and CANA assignment were applied to analyze the structural properties of typical DNA structures such as Dickerson-Drew dodecamers, guanine quadruplexes and structural models based on fibre diffraction. NtC, CANA and confal assignment, which is accessible at the website https://dnatco.org, allows the quantitative assessment and validation of DNA structures and their subsequent analysis by means of pseudo-sequence alignment. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Acta_Cryst_D:2.
- Keywords
- DNA modelling, DNA structure, NMR structure, X-ray structure, bioinformatics,
- MeSH
- DNA chemistry MeSH
- Nucleic Acid Conformation * MeSH
- Models, Molecular * MeSH
- Computer Graphics MeSH
- Molecular Dynamics Simulation MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA MeSH
The web service DNATCO (dnatco.org) classifies local conformations of DNA molecules beyond their traditional sorting to A, B and Z DNA forms. DNATCO provides an interface to robust algorithms assigning conformation classes called NTC: to dinucleotides extracted from DNA-containing structures uploaded in PDB format version 3.1 or above. The assigned dinucleotide NTC: classes are further grouped into DNA structural alphabet NTA: , to the best of our knowledge the first DNA structural alphabet. The results are presented at two levels: in the form of user friendly visualization and analysis of the assignment, and in the form of a downloadable, more detailed table for further analysis offline. The website is free and open to all users and there is no login requirement.
- MeSH
- Algorithms * MeSH
- DNA chemistry MeSH
- Internet MeSH
- Nucleic Acid Conformation MeSH
- Computer Graphics MeSH
- Software * MeSH
- Information Storage and Retrieval MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA MeSH
Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases.
- Keywords
- allostery, disorder, protein complexes, protein folding, protein structures, protein—DNA interactions, secondary structure, structural alphabet,
- Publication type
- Journal Article MeSH
- Review MeSH
To investigate the principles driving recognition between proteins and DNA, we analyzed more than thousand crystal structures of protein/DNA complexes. We classified protein and DNA conformations by structural alphabets, protein blocks [de Brevern, Etchebest and Hazout (2000) (Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Prots. Struct. Funct. Genet., 41:271-287)] and dinucleotide conformers [Svozil, Kalina, Omelka and Schneider (2008) (DNA conformations and their sequence preferences. Nucleic Acids Res., 36:3690-3706)], respectively. Assembling the mutually interacting protein blocks and dinucleotide conformers into 'interaction matrices' revealed their correlations and conformer preferences at the interface relative to their occurrence outside the interface. The analyzed data demonstrated important differences between complexes of various types of proteins such as transcription factors and nucleases, distinct interaction patterns for the DNA minor groove relative to the major groove and phosphate and importance of water-mediated contacts. Water molecules mediate proportionally the largest number of contacts in the minor groove and form the largest proportion of contacts in complexes of transcription factors. The generally known induction of A-DNA forms by complexation was more accurately attributed to A-like and intermediate A/B conformers rare in naked DNA molecules.
- MeSH
- DNA-Binding Proteins chemistry MeSH
- DNA chemistry MeSH
- Phosphates MeSH
- Data Interpretation, Statistical MeSH
- Nucleic Acid Conformation MeSH
- Protein Conformation MeSH
- Models, Molecular MeSH
- Protein Binding MeSH
- Water chemistry MeSH
- Computational Biology MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA-Binding Proteins MeSH
- DNA MeSH
- Phosphates MeSH
- Water MeSH
The Dickerson-Drew dodecamer (DD) d-[CGCGAATTCGCG]2 is a prototypic B-DNA molecule whose sequence-specific structure and dynamics have been investigated by many experimental and computational studies. Here, we present an analysis of DD properties based on extensive atomistic molecular dynamics (MD) simulations using different ionic conditions and water models. The 0.6-2.4-µs-long MD trajectories are compared to modern crystallographic and NMR data. In the simulations, the duplex ends can adopt an alternative base-pairing, which influences the oligomer structure. A clear relationship between the BI/BII backbone substates and the basepair step conformation has been identified, extending previous findings and exposing an interesting structural polymorphism in the helix. For a given end pairing, distributions of the basepair step coordinates can be decomposed into Gaussian-like components associated with the BI/BII backbone states. The nonlocal stiffness matrices for a rigid-base mechanical model of DD are reported for the first time, suggesting salient stiffness features of the central A-tract. The Riemann distance and Kullback-Leibler divergence are used for stiffness matrix comparison. The basic structural parameters converge very well within 300 ns, convergence of the BI/BII populations and stiffness matrices is less sharp. Our work presents new findings about the DD structural dynamics, mechanical properties, and the coupling between basepair and backbone configurations, including their statistical reliability. The results may also be useful for optimizing future force fields for DNA.
- Publication type
- Journal Article MeSH