Experimental characterization of de novo proteins and their unevolved random-sequence counterparts
Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
37024625
PubMed Central
PMC10089919
DOI
10.1038/s41559-023-02010-2
PII: 10.1038/s41559-023-02010-2
Knihovny.cz E-zdroje
- MeSH
- lidé MeSH
- proteiny * chemie MeSH
- proteomika * MeSH
- výpočetní biologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny * MeSH
De novo gene emergence provides a route for new proteins to be formed from previously non-coding DNA. Proteins born in this way are considered random sequences and typically assumed to lack defined structure. While it remains unclear how likely a de novo protein is to assume a soluble and stable tertiary structure, intersecting evidence from random sequence and de novo-designed proteins suggests that native-like biophysical properties are abundant in sequence space. Taking putative de novo proteins identified in human and fly, we experimentally characterize a library of these sequences to assess their solubility and structure propensity. We compare this library to a set of synthetic random proteins with no evolutionary history. Bioinformatic prediction suggests that de novo proteins may have remarkably similar distributions of biophysical properties to unevolved random sequences of a given length and amino acid composition. However, upon expression in vitro, de novo proteins exhibit moderately higher solubility which is further induced by the DnaK chaperone system. We suggest that while synthetic random sequences are a useful proxy for de novo proteins in terms of structure propensity, de novo proteins may be better integrated in the cellular system than random expectation, given their higher solubility.
Department of Biochemistry Charles University Prague Czech Republic
Department of Cell Biology Charles University BIOCEV Prague Czech Republic
Department of Protein Evolution MPI for Developmental Biology Tübingen Germany
Institute for Evolution and Biodiversity University of Münster Münster Germany
Institute of Microbiology Czech Academy of Sciences Prague Czech Republic
Institute of Organic Chemistry and Biochemistry Czech Academy of Sciences Prague Czech Republic
Zobrazit více v PubMed
Schmitz JF, Bornberg-Bauer E. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Research. 2017;6:57. doi: 10.12688/f1000research.10079.1. PubMed DOI PMC
Vakirlis N, et al. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat. Commun. 2020;11:781. doi: 10.1038/s41467-020-14500-z. PubMed DOI PMC
Zhang L, et al. Rapid evolution of protein diversity by de novo origination in Oryza. Nat. Ecol. Evol. 2019;3:679. doi: 10.1038/s41559-019-0822-5. PubMed DOI
Bornberg-Bauer E, Hlouchova K, Lange A. Structure and function of naturally evolved de novo proteins. Curr. Opin. Struct. Biol. 2021;68:175–183. doi: 10.1016/j.sbi.2020.11.010. PubMed DOI
Xie C, et al. A de novo evolved gene in the house mouse regulates female pregnancy cycles. eLife. 2019;8:e44392. doi: 10.7554/eLife.44392. PubMed DOI PMC
Bungard D, et al. Foldability of a natural de novo evolved protein. Structure. 2017;25:1687–1696. doi: 10.1016/j.str.2017.09.006. PubMed DOI PMC
Baalsrud HT, et al. De novo gene evolution of antifreeze glycoproteins in codfishes revealed by whole genome sequence data. Mol. Biol. Evol. 2018;35:593–606. doi: 10.1093/molbev/msx311. PubMed DOI PMC
Jin G, et al. New genes interacted with recent whole-genome duplicates in the fast stem growth of bamboos. Mol. Biol. Evol. 2021;38:5752–5768. doi: 10.1093/molbev/msab288. PubMed DOI PMC
Gubala AM, et al. The Goddard and Saturn genes are essential for Drosophila male fertility and may have arisen de novo. Mol. Biol. Evol. 2017;34:1066–1082. PubMed PMC
Lange A, et al. Structural and functional characterization of a putative de novo gene in Drosophila. Nat. Commun. 2021;12:1667. doi: 10.1038/s41467-021-21667-6. PubMed DOI PMC
Rivard EL, et al. A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster. PLoS Genet. 2021;17:e1009787. doi: 10.1371/journal.pgen.1009787. PubMed DOI PMC
Casola C. From de novo to “de nono”: the majority of novel protein-coding genes identified with phylostratigraphy are old genes or recent duplicates. Genome Biol. Evol. 2018;10:2906–2918. PubMed PMC
Schmitz JF, Ullrich KK, Bornberg-Bauer E. Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat. Ecol. Evol. 2018;2:1626–1632. doi: 10.1038/s41559-018-0639-7. PubMed DOI
Dowling D, Schmitz JF, Bornberg-Bauer E. Stochastic gain and loss of novel transcribed open reading frames in the human lineage. Genome Biol. Evol. 2020;12:2183–2195. doi: 10.1093/gbe/evaa194. PubMed DOI PMC
Heames B, Schmitz J, Bornberg-Bauer E. A continuum of evolving de novo genes drives protein-coding novelty in Drosophila. J. Mol. Evol. 2020;88:382–398. doi: 10.1007/s00239-020-09939-z. PubMed DOI PMC
Ángyán AF, Perczel A, Gáspári Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 2012;586:2468–2472. doi: 10.1016/j.febslet.2012.06.007. PubMed DOI
DeForte S, Uversky VN. Order, disorder, and everything in between. Molecules. 2016;21:1090. doi: 10.3390/molecules21081090. PubMed DOI PMC
Galtier N, et al. Codon usage bias in animals: disentangling the effects of natural selection, effective population size, and GC-biased gene conversion. Mol. Biol. Evol. 2018;35:1092–1103. doi: 10.1093/molbev/msy015. PubMed DOI
Basile, W., Salvatore, M. & Elofsson, A. The classification of orphans is improved by combining searches in both proteomes and genomes. Preprint at bioRxiv10.1101/185983 (2019).
Vymětal J, Vondrášek J, Hlouchová K. Sequence versus composition: what prescribes IDP biophysical properties? Entropy. 2019;21:654. doi: 10.3390/e21070654. PubMed DOI PMC
Chiarabelli C, Vrijbloed JW, Thomas RM, Luisi PL. Investigation of de novo totally random biosequences, Part I. Chem. Biodivers. 2006;3:827–839. doi: 10.1002/cbdv.200690087. PubMed DOI
Tompa P, Prilusky J, Silman I, Sussman JL. Structural disorder serves as a weak signal for intracellular protein degradation. Proteins Struct. Funct. Bioinforma. 2008;71:903–909. doi: 10.1002/prot.21773. PubMed DOI
Uversky VN, et al. Unfoldomics of human diseases: linking protein intrinsic disorder with diseases. BMC Genomics. 2009;10:S7. doi: 10.1186/1471-2164-10-S1-S7. PubMed DOI PMC
LaBean TH, Butt TR, Kauffman SA, Schultes EA. Protein folding absent selection. Genes. 2011;2:608–626. doi: 10.3390/genes2030608. PubMed DOI PMC
Yu J-F, et al. Natural protein sequences are more intrinsically disordered than random sequences. Cell. Mol. Life Sci. 2016;73:2949–2957. doi: 10.1007/s00018-016-2138-9. PubMed DOI PMC
Tretyachenko V, et al. Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci. Rep. 2017;7:15449. doi: 10.1038/s41598-017-15635-8. PubMed DOI PMC
Tretyachenko V, et al. Modern and prebiotic amino acids support distinct structural profiles in proteins. Open Biol. 2022;12:220040. doi: 10.1098/rsob.220040. PubMed DOI PMC
Tong CL, Lee K-H, Seelig B. De novo proteins from random sequences through in vitro evolution. Curr. Opin. Struct. Biol. 2021;68:129–134. doi: 10.1016/j.sbi.2020.12.014. PubMed DOI PMC
Hayashi Y, Sakata H, Makino Y, Urabe I, Yomo T. Can an arbitrary sequence evolve towards acquiring a biological function? J. Mol. Evol. 2003;56:162–168. doi: 10.1007/s00239-002-2389-y. PubMed DOI
Keefe AD, Szostak JW. Functional proteins from a random-sequence library. Nature. 2001;410:715–718. doi: 10.1038/35070613. PubMed DOI PMC
Kaiser CA, Preuss D, Grisafi P, Botstein D. Many random sequences functionally replace the secretion signal sequence of yeast invertase. Science. 1987;235:312–317. doi: 10.1126/science.3541205. PubMed DOI
Neme R, Amador C, Yildirim B, McConnell E, Tautz D. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 2017;1:0127. doi: 10.1038/s41559-017-0127. PubMed DOI PMC
Knopp, M. et al. De novo emergence of peptides that confer antibiotic resistance. mBio10.1128/mBio.00837-19 (2019). PubMed PMC
Knopp M, et al. A novel type of colistin resistance genes selected from random sequence space. PLoS Genet. 2021;17:e1009227. doi: 10.1371/journal.pgen.1009227. PubMed DOI PMC
Giacobelli VG, et al. In vitro evolution reveals noncationic protein–RNA interaction mediated by metal ions. Mol. Biol. Evol. 2022;39:msac032. doi: 10.1093/molbev/msac032. PubMed DOI PMC
Axe DD, Foster NW, Fersht AR. Active barnase variants with completely random hydrophobic cores. Proc. Natl Acad. Sci. USA. 1996;93:5590–5594. doi: 10.1073/pnas.93.11.5590. PubMed DOI PMC
Yamauchi A, et al. Evolvability of random polypeptides through functional selection within a small library. Protein Eng. 2002;15:619–626. doi: 10.1093/protein/15.7.619. PubMed DOI
Chao F-A, et al. Structure and dynamics of a primordial catalytic fold generated by in vitro evolution. Nat. Chem. Biol. 2013;9:81–83. doi: 10.1038/nchembio.1138. PubMed DOI PMC
Wang MS, Hecht MH. A completely de novo ATPase from combinatorial protein design. J. Am. Chem. Soc. 2020;142:15230–15234. doi: 10.1021/jacs.0c02954. PubMed DOI
Yang KK, Wu Z, Bedbrook CN, Arnold FH, Wren J. Learned protein embeddings for machine learning. Bioinformatics. 2018;34:2642–2648. doi: 10.1093/bioinformatics/bty178. PubMed DOI PMC
Rocklin GJ, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357:168–175. doi: 10.1126/science.aan0693. PubMed DOI PMC
Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods. 2019;16:1315–1322. doi: 10.1038/s41592-019-0598-1. PubMed DOI PMC
Fisher AC, Kim W, Delisa MP. Genetic selection for protein solubility enabled by the folding quality control feature of the twin-arginine translocation pathway. Protein Sci. 2006;15:449–458. doi: 10.1110/ps.051902606. PubMed DOI PMC
Lim H-K, et al. Mining mammalian genomes for folding competent proteins using Tat-dependent genetic selection in Escherichia coli. Protein Sci. 2009;18:2537–2549. doi: 10.1002/pro.262. PubMed DOI PMC
Hsiau TH-C, et al. A method for multiplex gene synthesis employing error correction based on expression. PLoS ONE. 2015;10:e0119927. doi: 10.1371/journal.pone.0119927. PubMed DOI PMC
Niwa T, et al. Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc. Natl Acad. Sci. USA. 2009;106:4201–4206. doi: 10.1073/pnas.0811922106. PubMed DOI PMC
Tompa P. Intrinsically unstructured proteins. Trends Biochem. Sci. 2002;27:527–533. doi: 10.1016/S0968-0004(02)02169-2. PubMed DOI
Eicholt LA, Aubel M, Berk K, Bornberg-Bauer E, Lange A. Heterologous expression of naturally evolved putative de novo proteins with chaperones. Protein Sci. 2022;31:e4371. doi: 10.1002/pro.4371. PubMed DOI PMC
Niwa T, Uemura E, Matsuno Y, Taguchi H. Translation-coupled protein folding assay using a protease to monitor the folding status. Protein Sci. 2019;28:1252–1261. PubMed PMC
Klein JC, et al. Multiplex pairwise assembly of array-derived DNA oligonucleotides. Nucleic Acids Res. 2016;44:e43. doi: 10.1093/nar/gkv1177. PubMed DOI PMC
Van Melderen L, Aertsen A. Regulation and quality control by Lon-dependent proteolysis. Res. Microbiol. 2009;160:645–651. doi: 10.1016/j.resmic.2009.08.021. PubMed DOI
Keeling DM, Garza P, Nartey CM, Carvunis A-R. The meanings of ‘function’ in biology and the problematic case of de novo gene emergence. eLife. 2019;8:e47014. doi: 10.7554/eLife.47014. PubMed DOI PMC
Zulkower V, Rosser S. DNA Chisel, a versatile sequence optimizer. Bioinformatics. 2020;36:4508–4509. doi: 10.1093/bioinformatics/btaa558. PubMed DOI
Claassens NJ, et al. Improving heterologous membrane protein production in Escherichia coli by combining transcriptional tuning and codon usage algorithms. PLoS ONE. 2017;12:e0184355. doi: 10.1371/journal.pone.0184355. PubMed DOI PMC
Mészáros B, Erdős G, Dosztányi Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018;46:W329–W337. doi: 10.1093/nar/gky384. PubMed DOI PMC
Heffernan R, Yang Y, Paliwal K, Zhou Y, Valencia A. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics. 2017;33:2842–2849. doi: 10.1093/bioinformatics/btx218. PubMed DOI
Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat. Biotechnol. 2004;22:1302–1306. doi: 10.1038/nbt1012. PubMed DOI
Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. doi: 10.1016/S0168-9525(00)02024-2. PubMed DOI
Peden, J. F. Analysis of Codon Usage (Univ. Nottingham, 1999).
Ma, E. J. & Kummer, A. Reimplementing Unirep in JAX. Preprint at bioRxiv10.1101/2020.05.11.088344 (2020).
Gutierres MBB, Bonorino CBC, Rigo MM. ChaperISM: improved chaperone binding prediction using position-independent scoring matrices. Bioinformatics. 2020;36:735–741. doi: 10.1093/bioinformatics/btz670. PubMed DOI
Harrison PM. fLPS: fast discovery of compositional biases for the protein universe. BMC Bioinf. 2017;18:476. doi: 10.1186/s12859-017-1906-3. PubMed DOI PMC
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. PubMed DOI PMC
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. DOI
Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. PubMed DOI PMC
Li H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. PubMed DOI PMC
Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. PubMed DOI PMC
Cox J, et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 2011;10:1794–1805. doi: 10.1021/pr101065j. PubMed DOI
Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008;26:1367–1372. doi: 10.1038/nbt.1511. PubMed DOI
Tyanova, S. & Cox, J. in Cancer Systems Biology: Methods and Protocols (ed. von Stechow, L.) 133–148 (Springer, 2018).
High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential
Toxin rescue by a random sequence