Nejvíce citovaný článek - PubMed ID 29133927
Random protein sequences can form defined secondary structures and are well-tolerated in vivo
In the quest to understand prebiotic catalysis, different molecular entities, mainly minerals, metal ions, organic cofactors, and ribozymes, have been implied as key players. Of these, inorganic and organic cofactors have gained attention for their ability to catalyze a wide array of reactions central to modern metabolism and frequently participate in these reactions within modern enzymes. Nevertheless, bridging the gap between prebiotic and modern metabolism remains a fundamental question in the origins of life. In this Account, peptides are investigated as a potential bridge linking prebiotic catalysis by minerals/cofactors to enzymes that dominate modern life's chemical reactions. Before ribosomal synthesis emerged, peptides of random sequences were plausible on early Earth. This was made possible by different sources of amino acid delivery and synthesis, as well as their condensation under a variety of conditions. Early peptides and proteins probably exhibited distinct compositions, enriched in small aliphatic and acidic residues. An increase in abundance of amino acids with larger side chains and canonical basic groups was most likely dependent on the emergence of their more challenging (bio)synthesis. Pressing questions thus arise: how did this composition influence the early peptide properties, and to what extent could they contribute to early metabolism? Recent research from our group and colleagues shows that highly acidic peptides/proteins comprising only the presumably "early" amino acids are in fact competent at secondary structure formation and even possess adaptive folding characteristics such as spontaneous refoldability and chaperone independence to achieve soluble structures. Moreover, we showed that highly acidic proteins of presumably "early" composition can still bind RNA by utilizing metal ions as cofactors to bridge carboxylate and phosphoester functional groups. And finally, ancient organic cofactors were shown to be capable of binding to sequences from amino acids considered prebiotically plausible, supporting their folding properties and providing functional groups, which would nominate them as catalytic hubs of great prebiotic relevance. These findings underscore the biochemical plausibility of an early peptide/protein world devoid of more complex amino acids yet collaborating with other catalytic species. Drawing from the mechanistic properties of protein-cofactor catalysis, it is speculated here that the early peptide/protein-cofactor ensemble could facilitate a similar range of chemical reactions, albeit with lower catalytic rates. This hypothesis invites a systematic experimental test. Nonetheless, this Account does not exclude other scenarios of prebiotic-to-biotic catalysis or prioritize any specific pathways of prebiotic syntheses. The objective is to examine peptide availability, composition, and functional potential among the various factors involved in the emergence of early life.
- MeSH
- aminokyseliny chemie metabolismus MeSH
- katalýza MeSH
- peptidy * chemie metabolismus MeSH
- původ života MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- aminokyseliny MeSH
- peptidy * MeSH
De novo genes emerge from previously noncoding stretches of the genome. Their encoded de novo proteins are generally expected to be similar to random sequences and, accordingly, with no stable tertiary fold and high predicted disorder. However, structural properties of de novo proteins and whether they differ during the stages of emergence and fixation have not been studied in depth and rely heavily on predictions. Here we generated a library of short human putative de novo proteins of varying lengths and ages and sorted the candidates according to their structural compactness and disorder propensity. Using Förster resonance energy transfer combined with Fluorescence-activated cell sorting, we were able to screen the library for most compact protein structures, as well as most elongated and flexible structures. We find that compact de novo proteins are on average slightly shorter and contain lower predicted disorder than less compact ones. The predicted structures for most and least compact de novo proteins correspond to expectations in that they contain more secondary structure content or higher disorder content, respectively. Our experiments indicate that older de novo proteins have higher compactness and structural propensity compared with young ones. We discuss possible evolutionary scenarios and their implications underlying the age-dependencies of compactness and structural content of putative de novo proteins.
- MeSH
- genová knihovna MeSH
- lidé MeSH
- proteiny * genetika MeSH
- sbalování proteinů * MeSH
- sekundární struktura proteinů MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny * MeSH
De novo gene emergence provides a route for new proteins to be formed from previously non-coding DNA. Proteins born in this way are considered random sequences and typically assumed to lack defined structure. While it remains unclear how likely a de novo protein is to assume a soluble and stable tertiary structure, intersecting evidence from random sequence and de novo-designed proteins suggests that native-like biophysical properties are abundant in sequence space. Taking putative de novo proteins identified in human and fly, we experimentally characterize a library of these sequences to assess their solubility and structure propensity. We compare this library to a set of synthetic random proteins with no evolutionary history. Bioinformatic prediction suggests that de novo proteins may have remarkably similar distributions of biophysical properties to unevolved random sequences of a given length and amino acid composition. However, upon expression in vitro, de novo proteins exhibit moderately higher solubility which is further induced by the DnaK chaperone system. We suggest that while synthetic random sequences are a useful proxy for de novo proteins in terms of structure propensity, de novo proteins may be better integrated in the cellular system than random expectation, given their higher solubility.
- MeSH
- lidé MeSH
- proteiny * chemie MeSH
- proteomika * MeSH
- výpočetní biologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny * MeSH
Whereas modern proteins rely on a quasi-universal repertoire of 20 canonical amino acids (AAs), numerous lines of evidence suggest that ancient proteins relied on a limited alphabet of 10 "early" AAs and that the 10 "late" AAs were products of biosynthetic pathways. However, many nonproteinogenic AAs were also prebiotically available, which begs two fundamental questions: Why do we have the current modern amino acid alphabet and would proteins be able to fold into globular structures as well if different amino acids comprised the genetic code? Here, we experimentally evaluate the solubility and secondary structure propensities of several prebiotically relevant amino acids in the context of synthetic combinatorial 25-mer peptide libraries. The most prebiotically abundant linear aliphatic and basic residues were incorporated along with or in place of other early amino acids to explore these alternative sequence spaces. The results show that foldability was likely a critical factor in the selection of the canonical alphabet. Unbranched aliphatic amino acids were purged from the proteinogenic alphabet despite their high prebiotic abundance because they generate polypeptides that are oversolubilized and have low packing efficiency. Surprisingly, we find that the inclusion of a short-chain basic amino acid also decreases polypeptides' secondary structure potential, for which we suggest a biophysical model. Our results support the view that, despite lacking basic residues, the early canonical alphabet was remarkably adaptive at supporting protein folding and explain why basic residues were only incorporated at a later stage of protein evolution.
- MeSH
- aminokyseliny * chemie MeSH
- peptidová knihovna MeSH
- peptidy genetika MeSH
- proteiny * chemie MeSH
- sbalování proteinů MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
- Názvy látek
- aminokyseliny * MeSH
- peptidová knihovna MeSH
- peptidy MeSH
- proteiny * MeSH
The earliest proteins had to rely on amino acids available on early Earth before the biosynthetic pathways for more complex amino acids evolved. In extant proteins, a significant fraction of the 'late' amino acids (such as Arg, Lys, His, Cys, Trp and Tyr) belong to essential catalytic and structure-stabilizing residues. How (or if) early proteins could sustain an early biosphere has been a major puzzle. Here, we analysed two combinatorial protein libraries representing proxies of the available sequence space at two different evolutionary stages. The first is composed of the entire alphabet of 20 amino acids while the second one consists of only 10 residues (ASDGLIPTEV) representing a consensus view of plausibly available amino acids through prebiotic chemistry. We show that compact conformations resistant to proteolysis are surprisingly similarly abundant in both libraries. In addition, the early alphabet proteins are inherently more soluble and refoldable, independent of the general Hsp70 chaperone activity. By contrast, chaperones significantly increase the otherwise poor solubility of the modern alphabet proteins suggesting their coevolution with the amino acid repertoire. Our work indicates that while both early and modern amino acids are predisposed to supporting protein structure, they do so with different biophysical properties and via different mechanisms.
- Klíčová slova
- amino acid alphabet, genetic code evolution, protein sequence space, protein structure, random proteins,
- MeSH
- aminokyseliny * chemie MeSH
- prebiotika * MeSH
- proteiny chemie MeSH
- sbalování proteinů MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- aminokyseliny * MeSH
- prebiotika * MeSH
- proteiny MeSH
Recent developments in Origins of Life research have focused on substantiating the narrative of an abiotic emergence of nucleic acids from organic molecules of low molecular weight, a paradigm that typically sidelines the roles of peptides. Nevertheless, the simple synthesis of amino acids, the facile nature of their activation and condensation, their ability to recognize metals and cofactors and their remarkable capacity to self-assemble make peptides (and their analogues) favourable candidates for one of the earliest functional polymers. In this mini-review, we explore the ramifications of this hypothesis. Diverse lines of research in molecular biology, bioinformatics, geochemistry, biophysics and astrobiology provide clues about the progression and early evolution of proteins, and lend credence to the idea that early peptides served many central prebiotic roles before they were encodable by a polynucleotide template, in a putative 'peptide-polynucleotide stage'. For example, early peptides and mini-proteins could have served as catalysts, compartments and structural hubs. In sum, we shed light on the role of early peptides and small proteins before and during the nucleotide world, in which nascent life fully grasped the potential of primordial proteins, and which has left an imprint on the idiosyncratic properties of extant proteins.
- Klíčová slova
- early peptides, origins of life, prebiotic polymers, protein evolution,
- MeSH
- nukleotidy MeSH
- nukleové kyseliny * MeSH
- peptidy chemie MeSH
- proteiny MeSH
- původ života * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- přehledy MeSH
- Názvy látek
- nukleotidy MeSH
- nukleové kyseliny * MeSH
- peptidy MeSH
- proteiny MeSH
MOTIVATION: Current techniques of protein engineering focus mostly on re-designing small targeted regions or defined structural scaffolds rather than constructing combinatorial libraries of versatile compositions and lengths. This is a missed opportunity because combinatorial libraries are emerging as a vital source of novel functional proteins and are of interest in diverse research areas. RESULTS: Here, we present a computational tool for Combinatorial Library Design (CoLiDe) offering precise control over protein sequence composition, length and diversity. The algorithm uses evolutionary approach to provide solutions to combinatorial libraries of degenerate DNA templates. We demonstrate its performance and precision using four different input alphabet distribution on different sequence lengths. In addition, a model design and experimental pipeline for protein library expression and purification is presented, providing a proof-of-concept that our protocol can be used to prepare purified protein library samples of up to 1011-1012 unique sequences. CoLiDe presents a composition-centric approach to protein design towards different functional phenomena. AVAILABILITYAND IMPLEMENTATION: CoLiDe is implemented in Python and freely available at https://github.com/voracva1/CoLiDe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- algoritmy * MeSH
- genová knihovna MeSH
- proteinové inženýrství MeSH
- proteiny * genetika MeSH
- sekvence aminokyselin MeSH
- software MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny * MeSH
The wide variety of protein structures and functions results from the diverse properties of the 20 canonical amino acids. The generally accepted hypothesis is that early protein evolution was associated with enrichment of a primordial alphabet, thereby enabling increased protein catalytic efficiencies and functional diversification. Aromatic amino acids were likely among the last additions to genetic code. The main objective of this study was to test whether enzyme catalysis can occur without the aromatic residues (aromatics) by studying the structure and function of dephospho-CoA kinase (DPCK) following aromatic residue depletion. We designed two variants of a putative DPCK from Aquifex aeolicus by substituting (a) Tyr, Phe and Trp or (b) all aromatics (including His). Their structural characterization indicates that substituting the aromatics does not markedly alter their secondary structures but does significantly loosen their side chain packing and increase their sizes. Both variants still possess ATPase activity, although with 150-300 times lower efficiency in comparison with the wild-type phosphotransferase activity. The transfer of the phosphate group to the dephospho-CoA substrate becomes heavily uncoupled and only the His-containing variant is still able to perform the phosphotransferase reaction. These data support the hypothesis that proteins in the early stages of life could support catalytic activities, albeit with low efficiencies. An observed significant contraction upon ligand binding is likely important for appropriate organization of the active site. Formation of firm hydrophobic cores, which enable the assembly of stably structured active sites, is suggested to provide a selective advantage for adding the aromatic residues.
- Klíčová slova
- aromatic amino acids, catalysis evolution, genetic code evolution, protein disorder, protein structure evolution,
- MeSH
- Aquifex enzymologie genetika MeSH
- bakteriální proteiny chemie genetika MeSH
- fosfotransferasy s alkoholovou skupinou jako akceptorem chemie genetika MeSH
- katalytická doména MeSH
- katalýza MeSH
- mutageneze cílená MeSH
- sekundární struktura proteinů MeSH
- substituce aminokyselin MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- bakteriální proteiny MeSH
- dephospho-CoA kinase MeSH Prohlížeč
- fosfotransferasy s alkoholovou skupinou jako akceptorem MeSH
Intrinsically disordered proteins (IDPs) represent a distinct class of proteins and are distinguished from globular proteins by conformational plasticity, high evolvability and a broad functional repertoire. Some of their properties are reminiscent of early proteins, but their abundance in eukaryotes, functional properties and compositional bias suggest that IDPs appeared at later evolutionary stages. The spectrum of IDP properties and their determinants are still not well defined. This study compares rudimentary physicochemical properties of IDPs and globular proteins using bioinformatic analysis on the level of their native sequences and random sequence permutations, addressing the contributions of composition versus sequence as determinants of the properties. IDPs have, on average, lower predicted secondary structure contents and aggregation propensities and biased amino acid compositions. However, our study shows that IDPs exhibit a broad range of these properties. Induced fold IDPs exhibit very similar compositions and secondary structure/aggregation propensities to globular proteins, and can be distinguished from unfoldable IDPs based on analysis of these sequence properties. While amino acid composition seems to be a major determinant of aggregation and secondary structure propensities, sequence randomization does not result in dramatic changes to these properties, but for both IDPs and globular proteins seems to fine-tune the tradeoff between folding and aggregation.
- Klíčová slova
- IDP, IDR, aggregation propensity, secondary structure prediction, sequence randomization,
- Publikační typ
- časopisecké články MeSH