structural alphabets
Dotaz
Zobrazit nápovědu
Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases.
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
DNA is a structurally plastic molecule, and its biological function is enabled by adaptation to its binding partners. To identify the DNA structural polymorphisms that are possible in such adaptations, the dinucleotide structures of 60 000 DNA steps from sequentially nonredundant crystal structures were classified and an automated protocol assigning 44 distinct structural (conformational) classes called NtC (for Nucleotide Conformers) was developed. To further facilitate understanding of the DNA structure, the NtC were assembled into the DNA structural alphabet CANA (Conformational Alphabet of Nucleic Acids) and the projection of CANA onto the graphical representation of the molecular structure was proposed. The NtC classification was used to define a validation score called confal, which quantifies the conformity between an analyzed structure and the geometries of NtC. NtC and CANA assignment were applied to analyze the structural properties of typical DNA structures such as Dickerson-Drew dodecamers, guanine quadruplexes and structural models based on fibre diffraction. NtC, CANA and confal assignment, which is accessible at the website https://dnatco.org, allows the quantitative assessment and validation of DNA structures and their subsequent analysis by means of pseudo-sequence alignment. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Acta_Cryst_D:2.
We analyzed the structural behavior of DNA complexed with regulatory proteins and the nucleosome core particle (NCP). The three-dimensional structures of almost 25 thousand dinucleotide steps from more than 500 sequentially non-redundant crystal structures were classified by using DNA structural alphabet CANA (Conformational Alphabet of Nucleic Acids) and associations between ten CANA letters and sixteen dinucleotide sequences were investigated. The associations showed features discriminating between specific and non-specific binding of DNA to proteins. Important is the specific role of two DNA structural forms, A-DNA, and BII-DNA, represented by the CANA letters AAA and BB2: AAA structures are avoided in non-specific NCP complexes, where the wrapping of the DNA duplex is explained by the periodic occurrence of BB2 every 10.3 steps. In both regulatory and NCP complexes, the extent of bending of the DNA local helical axis does not influence proportional representation of the CANA alphabet letters, namely the relative incidences of AAA and BB2 remain constant in bent and straight duplexes.
- Publikační typ
- časopisecké články MeSH
The earliest proteins had to rely on amino acids available on early Earth before the biosynthetic pathways for more complex amino acids evolved. In extant proteins, a significant fraction of the 'late' amino acids (such as Arg, Lys, His, Cys, Trp and Tyr) belong to essential catalytic and structure-stabilizing residues. How (or if) early proteins could sustain an early biosphere has been a major puzzle. Here, we analysed two combinatorial protein libraries representing proxies of the available sequence space at two different evolutionary stages. The first is composed of the entire alphabet of 20 amino acids while the second one consists of only 10 residues (ASDGLIPTEV) representing a consensus view of plausibly available amino acids through prebiotic chemistry. We show that compact conformations resistant to proteolysis are surprisingly similarly abundant in both libraries. In addition, the early alphabet proteins are inherently more soluble and refoldable, independent of the general Hsp70 chaperone activity. By contrast, chaperones significantly increase the otherwise poor solubility of the modern alphabet proteins suggesting their coevolution with the amino acid repertoire. Our work indicates that while both early and modern amino acids are predisposed to supporting protein structure, they do so with different biophysical properties and via different mechanisms.
- MeSH
- aminokyseliny * chemie MeSH
- prebiotika * MeSH
- proteiny chemie MeSH
- sbalování proteinů MeSH
- Publikační typ
- časopisecké články MeSH
By analyzing almost 120 000 dinucleotides in over 2000 nonredundant nucleic acid crystal structures, we define 96+1 diNucleotide Conformers, NtCs, which describe the geometry of RNA and DNA dinucleotides. NtC classes are grouped into 15 codes of the structural alphabet CANA (Conformational Alphabet of Nucleic Acids) to simplify symbolic annotation of the prominent structural features of NAs and their intuitive graphical display. The search for nontrivial patterns of NtCs resulted in the identification of several types of RNA loops, some of them observed for the first time. Over 30% of the nearly six million dinucleotides in the PDB cannot be assigned to any NtC class but we demonstrate that up to a half of them can be re-refined with the help of proper refinement targets. A statistical analysis of the preferences of NtCs and CANA codes for the 16 dinucleotide sequences showed that neither the NtC class AA00, which forms the scaffold of RNA structures, nor BB00, the DNA most populated class, are sequence neutral but their distributions are significantly biased. The reported automated assignment of the NtC classes and CANA codes available at dnatco.org provides a powerful tool for unbiased analysis of nucleic acid structures by structural and molecular biologists.
- MeSH
- biokatalýza MeSH
- DNA chemie klasifikace MeSH
- konformace nukleové kyseliny * MeSH
- nukleotidové motivy * MeSH
- nukleotidy chemie klasifikace MeSH
- reprodukovatelnost výsledků MeSH
- riboswitch MeSH
- ribozomy chemie metabolismus MeSH
- RNA katalytická chemie metabolismus MeSH
- RNA chemie klasifikace MeSH
- vazebná místa MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The protein sequences found in nature represent a tiny fraction of the potential sequences that could be constructed from the 20-amino-acid alphabet. To help define the properties that shaped proteins to stand out from the space of possible alternatives, we conducted a systematic computational and experimental exploration of random (unevolved) sequences in comparison with biological proteins. In our study, combinations of secondary structure, disorder, and aggregation predictions are accompanied by experimental characterization of selected proteins. We found that the overall secondary structure and physicochemical properties of random and biological sequences are very similar. Moreover, random sequences can be well-tolerated by living cells. Contrary to early hypotheses about the toxicity of random and disordered proteins, we found that random sequences with high disorder have low aggregation propensity (unlike random sequences with high structural content) and were particularly well-tolerated. This direct structure content/aggregation propensity dependence differentiates random and biological proteins. Our study indicates that while random sequences can be both structured and disordered, the properties of the latter make them better suited as progenitors (in both in vivo and in vitro settings) for further evolution of complex, soluble, three-dimensional scaffolds that can perform specific biochemical tasks.
- MeSH
- cirkulární dichroismus MeSH
- databáze proteinů MeSH
- datové soubory jako téma MeSH
- molekulární modely * MeSH
- nukleární magnetická rezonance biomolekulární MeSH
- peptidová knihovna * MeSH
- proteinové agregáty MeSH
- rekombinantní proteiny chemie izolace a purifikace toxicita MeSH
- rozpustnost MeSH
- sbalování proteinů MeSH
- sekundární struktura proteinů * MeSH
- sekvence aminokyselin MeSH
- výpočetní biologie MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Two studies investigated the importance of phoneme awareness relative to other predictors in the development of reading and spelling among children learning a consistent orthography (Czech) and an inconsistent orthography (English). In Study 1, structural equation models revealed that Czech (n=107) and English (n=71) data were fitted well by the same predictors of reading and spelling. Phoneme awareness was a unique predictor in all models. In Study 2, Czech (n=40) and English (n=27) children with dyslexia showed similar deficits on phoneme awareness relative to their age- and spelling-matched control peers. Phoneme awareness appears to be a core component skill of alphabetic literacy, which is equally important for learners of consistent and inconsistent orthographies.
- MeSH
- čtení * MeSH
- dítě MeSH
- dyslexie diagnóza psychologie MeSH
- fonetika * MeSH
- individualita MeSH
- jazyk (prostředek komunikace) * MeSH
- lidé MeSH
- neuropsychologické testy MeSH
- plošný screening MeSH
- pochopení MeSH
- psaní * MeSH
- srovnání kultur * MeSH
- uvědomování si * MeSH
- věkové faktory MeSH
- výuka - hodnocení MeSH
- Wechslerovy škály MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH
- Geografické názvy
- Anglie MeSH
- Česká republika MeSH
We have developed a family of unnatural base pairs (UBPs), which rely on hydrophobic and packing interactions for pairing and which are well replicated and transcribed. While the pair formed between d5SICS and dNaM (d5SICS-dNaM) has received the most attention, and has been used to expand the genetic alphabet of a living organism, recent efforts have identified dTPT3-dNaM, which is replicated with even higher fidelity. These efforts also resulted in more UBPs than could be independently analyzed, and thus we now report a PCR-based screen to identify the most promising. While we found that dTPT3-dNaM is generally the most promising UBP, we identified several others that are replicated nearly as well and significantly better than d5SICS-dNaM, and are thus viable candidates for the expansion of the genetic alphabet of a living organism. Moreover, the results suggest that continued optimization should be possible, and that the putatively essential hydrogen-bond acceptor at the position ortho to the glycosidic linkage may not be required. These results clearly demonstrate the generality of hydrophobic forces for the control of base pairing within DNA, provide a wealth of new structure-activity relationship data and importantly identify multiple new candidates for in vivo evaluation and further optimization.
The motif DGYW/WRCH (Mh) and its frequently discussed simplified derivative GYW/WRC (Mhs) are involved in immunoglobulin (Ig) hypermutation. Both these motifs appear to be markedly shorter than the corresponding conventionally predicted minima of valid sequence lengths (MVSL). The same conclusion concerning both Mh and Mhs can also be obtained in the combined case including a less strict semi-empirically defined w-value and one nucleotide length tolerance related to MVSL. Such disagreement indicates considerably low information content in Mh and Mhs when evaluating these motifs as alphabetical structures (words). This fact raises a question of actually recognized structures (presumably longer than Mh and Mhs). Interestingly, both Mh and Mhs dimers or pairs of closely located Mh or Mhs achieve confirmation of length validity in the case of w=0.05, suggesting thus double-motif recognition as one of statistically consistent explanations. This possibility is also in agreement with the results of our model sequence study of mRNA derived from variable Ig gene sequences (rIgV) with respect to the most frequently occurring structures formed by motif overlaps in all model sequence sets. On the other hand, additional superior occurrence of motif pairs at a structurally important distance of a single DNA thread was found in the conserved domain (cd00099) related sequences of Elasmobranchii origin and less markedly in the corresponding human rIgV, but not in a randomly selected human subset of rIgV. The data are discussed with respect to statistical evaluation and structural properties of hypermutation motifs or the competent enzyme, i.e. activation-induced cytidine deaminase.
- MeSH
- aktivace enzymů MeSH
- aminokyselinové motivy MeSH
- cytidindeaminasa genetika MeSH
- geny pro imunoglobuliny MeSH
- lidé MeSH
- modely genetické MeSH
- mutační analýza DNA MeSH
- somatická hypermutace imunoglobulinových genů MeSH
- statistické modely MeSH
- terciární struktura proteinů MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
The wide variety of protein structures and functions results from the diverse properties of the 20 canonical amino acids. The generally accepted hypothesis is that early protein evolution was associated with enrichment of a primordial alphabet, thereby enabling increased protein catalytic efficiencies and functional diversification. Aromatic amino acids were likely among the last additions to genetic code. The main objective of this study was to test whether enzyme catalysis can occur without the aromatic residues (aromatics) by studying the structure and function of dephospho-CoA kinase (DPCK) following aromatic residue depletion. We designed two variants of a putative DPCK from Aquifex aeolicus by substituting (a) Tyr, Phe and Trp or (b) all aromatics (including His). Their structural characterization indicates that substituting the aromatics does not markedly alter their secondary structures but does significantly loosen their side chain packing and increase their sizes. Both variants still possess ATPase activity, although with 150-300 times lower efficiency in comparison with the wild-type phosphotransferase activity. The transfer of the phosphate group to the dephospho-CoA substrate becomes heavily uncoupled and only the His-containing variant is still able to perform the phosphotransferase reaction. These data support the hypothesis that proteins in the early stages of life could support catalytic activities, albeit with low efficiencies. An observed significant contraction upon ligand binding is likely important for appropriate organization of the active site. Formation of firm hydrophobic cores, which enable the assembly of stably structured active sites, is suggested to provide a selective advantage for adding the aromatic residues.
- MeSH
- Aquifex enzymologie genetika MeSH
- bakteriální proteiny chemie genetika MeSH
- fosfotransferasy s alkoholovou skupinou jako akceptorem chemie genetika MeSH
- katalytická doména MeSH
- katalýza MeSH
- mutageneze cílená MeSH
- sekundární struktura proteinů MeSH
- substituce aminokyselin MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH