Experimental characterization of de novo proteins and their unevolved random-sequence counterparts

. 2023 Apr ; 7 (4) : 570-580. [epub] 20230406

Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid37024625
Odkazy

PubMed 37024625
PubMed Central PMC10089919
DOI 10.1038/s41559-023-02010-2
PII: 10.1038/s41559-023-02010-2
Knihovny.cz E-zdroje

De novo gene emergence provides a route for new proteins to be formed from previously non-coding DNA. Proteins born in this way are considered random sequences and typically assumed to lack defined structure. While it remains unclear how likely a de novo protein is to assume a soluble and stable tertiary structure, intersecting evidence from random sequence and de novo-designed proteins suggests that native-like biophysical properties are abundant in sequence space. Taking putative de novo proteins identified in human and fly, we experimentally characterize a library of these sequences to assess their solubility and structure propensity. We compare this library to a set of synthetic random proteins with no evolutionary history. Bioinformatic prediction suggests that de novo proteins may have remarkably similar distributions of biophysical properties to unevolved random sequences of a given length and amino acid composition. However, upon expression in vitro, de novo proteins exhibit moderately higher solubility which is further induced by the DnaK chaperone system. We suggest that while synthetic random sequences are a useful proxy for de novo proteins in terms of structure propensity, de novo proteins may be better integrated in the cellular system than random expectation, given their higher solubility.

Zobrazit více v PubMed

Schmitz JF, Bornberg-Bauer E. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Research. 2017;6:57. doi: 10.12688/f1000research.10079.1. PubMed DOI PMC

Vakirlis N, et al. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat. Commun. 2020;11:781. doi: 10.1038/s41467-020-14500-z. PubMed DOI PMC

Zhang L, et al. Rapid evolution of protein diversity by de novo origination in Oryza. Nat. Ecol. Evol. 2019;3:679. doi: 10.1038/s41559-019-0822-5. PubMed DOI

Bornberg-Bauer E, Hlouchova K, Lange A. Structure and function of naturally evolved de novo proteins. Curr. Opin. Struct. Biol. 2021;68:175–183. doi: 10.1016/j.sbi.2020.11.010. PubMed DOI

Xie C, et al. A de novo evolved gene in the house mouse regulates female pregnancy cycles. eLife. 2019;8:e44392. doi: 10.7554/eLife.44392. PubMed DOI PMC

Bungard D, et al. Foldability of a natural de novo evolved protein. Structure. 2017;25:1687–1696. doi: 10.1016/j.str.2017.09.006. PubMed DOI PMC

Baalsrud HT, et al. De novo gene evolution of antifreeze glycoproteins in codfishes revealed by whole genome sequence data. Mol. Biol. Evol. 2018;35:593–606. doi: 10.1093/molbev/msx311. PubMed DOI PMC

Jin G, et al. New genes interacted with recent whole-genome duplicates in the fast stem growth of bamboos. Mol. Biol. Evol. 2021;38:5752–5768. doi: 10.1093/molbev/msab288. PubMed DOI PMC

Gubala AM, et al. The Goddard and Saturn genes are essential for Drosophila male fertility and may have arisen de novo. Mol. Biol. Evol. 2017;34:1066–1082. PubMed PMC

Lange A, et al. Structural and functional characterization of a putative de novo gene in Drosophila. Nat. Commun. 2021;12:1667. doi: 10.1038/s41467-021-21667-6. PubMed DOI PMC

Rivard EL, et al. A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster. PLoS Genet. 2021;17:e1009787. doi: 10.1371/journal.pgen.1009787. PubMed DOI PMC

Casola C. From de novo to “de nono”: the majority of novel protein-coding genes identified with phylostratigraphy are old genes or recent duplicates. Genome Biol. Evol. 2018;10:2906–2918. PubMed PMC

Schmitz JF, Ullrich KK, Bornberg-Bauer E. Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat. Ecol. Evol. 2018;2:1626–1632. doi: 10.1038/s41559-018-0639-7. PubMed DOI

Dowling D, Schmitz JF, Bornberg-Bauer E. Stochastic gain and loss of novel transcribed open reading frames in the human lineage. Genome Biol. Evol. 2020;12:2183–2195. doi: 10.1093/gbe/evaa194. PubMed DOI PMC

Heames B, Schmitz J, Bornberg-Bauer E. A continuum of evolving de novo genes drives protein-coding novelty in Drosophila. J. Mol. Evol. 2020;88:382–398. doi: 10.1007/s00239-020-09939-z. PubMed DOI PMC

Ángyán AF, Perczel A, Gáspári Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 2012;586:2468–2472. doi: 10.1016/j.febslet.2012.06.007. PubMed DOI

DeForte S, Uversky VN. Order, disorder, and everything in between. Molecules. 2016;21:1090. doi: 10.3390/molecules21081090. PubMed DOI PMC

Galtier N, et al. Codon usage bias in animals: disentangling the effects of natural selection, effective population size, and GC-biased gene conversion. Mol. Biol. Evol. 2018;35:1092–1103. doi: 10.1093/molbev/msy015. PubMed DOI

Basile, W., Salvatore, M. & Elofsson, A. The classification of orphans is improved by combining searches in both proteomes and genomes. Preprint at bioRxiv10.1101/185983 (2019).

Vymětal J, Vondrášek J, Hlouchová K. Sequence versus composition: what prescribes IDP biophysical properties? Entropy. 2019;21:654. doi: 10.3390/e21070654. PubMed DOI PMC

Chiarabelli C, Vrijbloed JW, Thomas RM, Luisi PL. Investigation of de novo totally random biosequences, Part I. Chem. Biodivers. 2006;3:827–839. doi: 10.1002/cbdv.200690087. PubMed DOI

Tompa P, Prilusky J, Silman I, Sussman JL. Structural disorder serves as a weak signal for intracellular protein degradation. Proteins Struct. Funct. Bioinforma. 2008;71:903–909. doi: 10.1002/prot.21773. PubMed DOI

Uversky VN, et al. Unfoldomics of human diseases: linking protein intrinsic disorder with diseases. BMC Genomics. 2009;10:S7. doi: 10.1186/1471-2164-10-S1-S7. PubMed DOI PMC

LaBean TH, Butt TR, Kauffman SA, Schultes EA. Protein folding absent selection. Genes. 2011;2:608–626. doi: 10.3390/genes2030608. PubMed DOI PMC

Yu J-F, et al. Natural protein sequences are more intrinsically disordered than random sequences. Cell. Mol. Life Sci. 2016;73:2949–2957. doi: 10.1007/s00018-016-2138-9. PubMed DOI PMC

Tretyachenko V, et al. Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci. Rep. 2017;7:15449. doi: 10.1038/s41598-017-15635-8. PubMed DOI PMC

Tretyachenko V, et al. Modern and prebiotic amino acids support distinct structural profiles in proteins. Open Biol. 2022;12:220040. doi: 10.1098/rsob.220040. PubMed DOI PMC

Tong CL, Lee K-H, Seelig B. De novo proteins from random sequences through in vitro evolution. Curr. Opin. Struct. Biol. 2021;68:129–134. doi: 10.1016/j.sbi.2020.12.014. PubMed DOI PMC

Hayashi Y, Sakata H, Makino Y, Urabe I, Yomo T. Can an arbitrary sequence evolve towards acquiring a biological function? J. Mol. Evol. 2003;56:162–168. doi: 10.1007/s00239-002-2389-y. PubMed DOI

Keefe AD, Szostak JW. Functional proteins from a random-sequence library. Nature. 2001;410:715–718. doi: 10.1038/35070613. PubMed DOI PMC

Kaiser CA, Preuss D, Grisafi P, Botstein D. Many random sequences functionally replace the secretion signal sequence of yeast invertase. Science. 1987;235:312–317. doi: 10.1126/science.3541205. PubMed DOI

Neme R, Amador C, Yildirim B, McConnell E, Tautz D. Random sequences are an abundant source of bioactive RNAs or peptides. Nat. Ecol. Evol. 2017;1:0127. doi: 10.1038/s41559-017-0127. PubMed DOI PMC

Knopp, M. et al. De novo emergence of peptides that confer antibiotic resistance. mBio10.1128/mBio.00837-19 (2019). PubMed PMC

Knopp M, et al. A novel type of colistin resistance genes selected from random sequence space. PLoS Genet. 2021;17:e1009227. doi: 10.1371/journal.pgen.1009227. PubMed DOI PMC

Giacobelli VG, et al. In vitro evolution reveals noncationic protein–RNA interaction mediated by metal ions. Mol. Biol. Evol. 2022;39:msac032. doi: 10.1093/molbev/msac032. PubMed DOI PMC

Axe DD, Foster NW, Fersht AR. Active barnase variants with completely random hydrophobic cores. Proc. Natl Acad. Sci. USA. 1996;93:5590–5594. doi: 10.1073/pnas.93.11.5590. PubMed DOI PMC

Yamauchi A, et al. Evolvability of random polypeptides through functional selection within a small library. Protein Eng. 2002;15:619–626. doi: 10.1093/protein/15.7.619. PubMed DOI

Chao F-A, et al. Structure and dynamics of a primordial catalytic fold generated by in vitro evolution. Nat. Chem. Biol. 2013;9:81–83. doi: 10.1038/nchembio.1138. PubMed DOI PMC

Wang MS, Hecht MH. A completely de novo ATPase from combinatorial protein design. J. Am. Chem. Soc. 2020;142:15230–15234. doi: 10.1021/jacs.0c02954. PubMed DOI

Yang KK, Wu Z, Bedbrook CN, Arnold FH, Wren J. Learned protein embeddings for machine learning. Bioinformatics. 2018;34:2642–2648. doi: 10.1093/bioinformatics/bty178. PubMed DOI PMC

Rocklin GJ, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357:168–175. doi: 10.1126/science.aan0693. PubMed DOI PMC

Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods. 2019;16:1315–1322. doi: 10.1038/s41592-019-0598-1. PubMed DOI PMC

Fisher AC, Kim W, Delisa MP. Genetic selection for protein solubility enabled by the folding quality control feature of the twin-arginine translocation pathway. Protein Sci. 2006;15:449–458. doi: 10.1110/ps.051902606. PubMed DOI PMC

Lim H-K, et al. Mining mammalian genomes for folding competent proteins using Tat-dependent genetic selection in Escherichia coli. Protein Sci. 2009;18:2537–2549. doi: 10.1002/pro.262. PubMed DOI PMC

Hsiau TH-C, et al. A method for multiplex gene synthesis employing error correction based on expression. PLoS ONE. 2015;10:e0119927. doi: 10.1371/journal.pone.0119927. PubMed DOI PMC

Niwa T, et al. Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc. Natl Acad. Sci. USA. 2009;106:4201–4206. doi: 10.1073/pnas.0811922106. PubMed DOI PMC

Tompa P. Intrinsically unstructured proteins. Trends Biochem. Sci. 2002;27:527–533. doi: 10.1016/S0968-0004(02)02169-2. PubMed DOI

Eicholt LA, Aubel M, Berk K, Bornberg-Bauer E, Lange A. Heterologous expression of naturally evolved putative de novo proteins with chaperones. Protein Sci. 2022;31:e4371. doi: 10.1002/pro.4371. PubMed DOI PMC

Niwa T, Uemura E, Matsuno Y, Taguchi H. Translation-coupled protein folding assay using a protease to monitor the folding status. Protein Sci. 2019;28:1252–1261. PubMed PMC

Klein JC, et al. Multiplex pairwise assembly of array-derived DNA oligonucleotides. Nucleic Acids Res. 2016;44:e43. doi: 10.1093/nar/gkv1177. PubMed DOI PMC

Van Melderen L, Aertsen A. Regulation and quality control by Lon-dependent proteolysis. Res. Microbiol. 2009;160:645–651. doi: 10.1016/j.resmic.2009.08.021. PubMed DOI

Keeling DM, Garza P, Nartey CM, Carvunis A-R. The meanings of ‘function’ in biology and the problematic case of de novo gene emergence. eLife. 2019;8:e47014. doi: 10.7554/eLife.47014. PubMed DOI PMC

Zulkower V, Rosser S. DNA Chisel, a versatile sequence optimizer. Bioinformatics. 2020;36:4508–4509. doi: 10.1093/bioinformatics/btaa558. PubMed DOI

Claassens NJ, et al. Improving heterologous membrane protein production in Escherichia coli by combining transcriptional tuning and codon usage algorithms. PLoS ONE. 2017;12:e0184355. doi: 10.1371/journal.pone.0184355. PubMed DOI PMC

Mészáros B, Erdős G, Dosztányi Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018;46:W329–W337. doi: 10.1093/nar/gky384. PubMed DOI PMC

Heffernan R, Yang Y, Paliwal K, Zhou Y, Valencia A. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics. 2017;33:2842–2849. doi: 10.1093/bioinformatics/btx218. PubMed DOI

Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat. Biotechnol. 2004;22:1302–1306. doi: 10.1038/nbt1012. PubMed DOI

Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. doi: 10.1016/S0168-9525(00)02024-2. PubMed DOI

Peden, J. F. Analysis of Codon Usage (Univ. Nottingham, 1999).

Ma, E. J. & Kummer, A. Reimplementing Unirep in JAX. Preprint at bioRxiv10.1101/2020.05.11.088344 (2020).

Gutierres MBB, Bonorino CBC, Rigo MM. ChaperISM: improved chaperone binding prediction using position-independent scoring matrices. Bioinformatics. 2020;36:735–741. doi: 10.1093/bioinformatics/btz670. PubMed DOI

Harrison PM. fLPS: fast discovery of compositional biases for the protein universe. BMC Bioinf. 2017;18:476. doi: 10.1186/s12859-017-1906-3. PubMed DOI PMC

Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. PubMed DOI PMC

Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. DOI

Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. PubMed DOI PMC

Li H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. PubMed DOI PMC

Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. PubMed DOI PMC

Cox J, et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 2011;10:1794–1805. doi: 10.1021/pr101065j. PubMed DOI

Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008;26:1367–1372. doi: 10.1038/nbt.1511. PubMed DOI

Tyanova, S. & Cox, J. in Cancer Systems Biology: Methods and Protocols (ed. von Stechow, L.) 133–148 (Springer, 2018).

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...