High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential
Jazyk angličtina Země Anglie, Velká Británie Médium print
Typ dokumentu časopisecké články, práce podpořená grantem
Grantová podpora
98183
Volkswagen Foundation
HFSP - RGP004/2023
HFSP
Charles University
722610
Horizon 2020 Research and Innovation Framework Programme
Erasmus+
PubMed
38597156
PubMed Central
PMC11024478
DOI
10.1093/gbe/evae069
PII: 7643132
Knihovny.cz E-zdroje
- MeSH
- genová knihovna MeSH
- lidé MeSH
- proteiny * genetika MeSH
- sbalování proteinů * MeSH
- sekundární struktura proteinů MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny * MeSH
De novo genes emerge from previously noncoding stretches of the genome. Their encoded de novo proteins are generally expected to be similar to random sequences and, accordingly, with no stable tertiary fold and high predicted disorder. However, structural properties of de novo proteins and whether they differ during the stages of emergence and fixation have not been studied in depth and rely heavily on predictions. Here we generated a library of short human putative de novo proteins of varying lengths and ages and sorted the candidates according to their structural compactness and disorder propensity. Using Förster resonance energy transfer combined with Fluorescence-activated cell sorting, we were able to screen the library for most compact protein structures, as well as most elongated and flexible structures. We find that compact de novo proteins are on average slightly shorter and contain lower predicted disorder than less compact ones. The predicted structures for most and least compact de novo proteins correspond to expectations in that they contain more secondary structure content or higher disorder content, respectively. Our experiments indicate that older de novo proteins have higher compactness and structural propensity compared with young ones. We discuss possible evolutionary scenarios and their implications underlying the age-dependencies of compactness and structural content of putative de novo proteins.
Department of Biochemistry Faculty of Science Charles University Prague Czech Republic
Department of Cell Biology Faculty of Science Charles University Prague Czech Republic
Department of Protein Evolution Max Planck Institute for Biology Tuebingen Tuebingen Germany
Imaging Methods Core Facility BIOCEV Prague Czech Republic
Institute for Evolution and Biodiversity University of Muenster Muenster Germany
Institute of Organic Chemistry and Biochemistry Czech Academy of Sciences Prague Czech Republic
Zobrazit více v PubMed
Agozzino L, Dill KA. Protein evolution speed depends on its stability and abundance and on chaperone concentrations. Proc Natl Acad Sci USA. 2018:115(37):9092–9097. 10.1073/pnas.1810194115. PubMed DOI PMC
Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015:31(2):166–169. 10.1093/bioinformatics/btu638. PubMed DOI PMC
Ángyán AF, Perczel A, Gáspári Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 2012:586(16):2468–2472. 10.1016/j.febslet.2012.06.007. PubMed DOI
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res. 2023:12:347. 10.12688/f1000research. PubMed DOI PMC
Bajar BT, Wang ES, Zhang S, Lin MZ, Chu J. A guide to fluorescent protein FRET pairs. Sensors (Basel). 2016:16(9):1488. 10.3390/s16091488. PubMed DOI PMC
Banning C, Votteler J, Hoffmann D, Koppensteiner H, Warmer M, Reimer R, Kirchhoff F, Schubert U, Hauber J, Schindler M, et al. A flow cytometry-based FRET assay to identify and analyse protein-protein interactions in living cells. PLoS ONE. 2010:5(2):e9344. 10.1371/journal.pone.0009344. PubMed DOI PMC
Blevins WR, Ruiz-Orera J, Messeguer X, Blasco-Moreno B, Villanueva-Cañas JL, Espinar L, Díez J, Carey LB, Albà MM. Uncovering de novo gene birth in yeast using deep transcriptomics. Nat Commun. 2021:12(1):604. 10.1038/s41467-021-20911-3. PubMed DOI PMC
Bornberg-Bauer E, Hlouchova K, Lange A. Structure and function of naturally evolved de novo proteins. Curr Opin Struct Biol. 2021:68:175–183. 10.1016/j.sbi.2020.11.010. PubMed DOI
Broeils LA, Ruiz-Orera J, Snel B, Hubner N, van Heesch S. Evolution and implications of de novo genes in humans. Nat Ecol Evol. 2023:7(6):804–815. 10.1038/s41559-023-02014-y. PubMed DOI
Bungard D, Copple JS, Yan J, Chhun JJ, Kumirov VK, Foy SG, Masel J, Wysocki VH, Cordes MHJ. Foldability of a natural de novo evolved protein. Structure. 2017:25(11):1687–1696.e4. 10.1016/j.str.2017.09.006. PubMed DOI PMC
Chen J, Li Q, Xia S, Arsala D, Sosa D, Wang D, Long M. One million years of solitude: the rapid evolution of de novo protein structure and complex. 2023. 10.1101/2023.12.24.573215. PubMed DOI PMC
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018:34(17):i884–i890. 10.1093/bioinformatics/bty560. PubMed DOI PMC
Dohmen E, Klasberg S, Bornberg-Bauer E, Perrey S, Kemena C. The modular nature of protein evolution: domain rearrangement rates across eukaryotic life. BMC Evol Biol. 2020:20(1):30. 10.1186/s12862-020-1591-0. PubMed DOI PMC
Dowling D, Schmitz JF, Bornberg-Bauer E, Aoife M. Stochastic gain and loss of novel transcribed open reading frames in the human lineage. Genome Biol Evol. 2020:12(11):2183–2195. 10.1093/gbe/evaa194. PubMed DOI PMC
Elofsson A. Progress at protein structure prediction, as seen in CASP15. Curr Opin Struct Biol. 2023:80:102594. 10.1016/j.sbi.2023.102594. PubMed DOI
EMBL-EBI . Uniprotkb/trembl protein database release
Förster T. Zwischenmolekulare energiewanderung und fluoreszenz. Ann Phys. 1948:437(1-2):55–75. 10.1002/andp.19484370105. DOI
Goedhart J, von Stetten D, Noirclerc-Savoye M, Lelimousin M, Joosen L, Hink MA, van Weeren L, Gadella TWJ, Royant A. Structure-guided evolution of cyan fluorescent proteins towards a quantum yield of 93%. Nat Commun. 2012:3:751. 10.1038/ncomms1738. PubMed DOI PMC
Grandchamp A, Kühl L, Lebherz M, Brüggemann K, Parsch J, Bornberg-Bauer E. Population genomics reveals mechanisms and dynamics of de novo expressed open reading frame emergence in Drosophila melanogaster. Genome Res. 2023:33(6):872–890. 10.1101/gr.277482.122. PubMed DOI PMC
Gubala AM, Schmitz JF, Kearns MJ, Vinh TT, Bornberg-Bauer E, Wolfner MF, Findlay GD. The goddard and saturn genes are essential for Drosophila male fertility and may have arisen de novo. Mol Biol Evol. 2017:34(5):1066–1082. 10.1093/molbev/msx057. PubMed DOI PMC
Guerzoni D, McLysaght A. De novo genes arise at a slow but steady rate along the primate lineage and have been subject to incomplete lineage sorting. Genome Biol Evol. 2016:8(4):1222–1232. 10.1093/gbe/evw074. PubMed DOI PMC
Heames B, Buchel F, Aubel M, Tretyachenko V, Loginov D, Novák P, Lange A, Bornberg-Bauer E, Hlouchová K. Experimental characterization of de novo proteins and their unevolved random-sequence counterparts. Nat Ecol Evol. 2023:7(4):570–580. 10.1038/s41559-023-02010-2. PubMed DOI PMC
Heames B, Schmitz J, Bornberg-Bauer E. A continuum of evolving de novo genes drives protein-coding novelty in Drosophila. J Mol Evol. 2020:88(4):382–398. 10.1007/s00239-020-09939-z. PubMed DOI PMC
Her C, Yeh Y, Krishnan VV. The ensemble of conformations of antifreeze glycoproteins (AFGP8): a study using nuclear magnetic resonance spectroscopy. Biomolecules. 2019:9(6):235. 10.3390/biom9060235. PubMed DOI PMC
Høie MH, Kiehl EN, Petersen B, Nielsen M, Winther O, Nielsen H, Hallgren J, Marcatili P. NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning. Nucleic Acids Res. 2022:50:W510–W515. 10.1093/nar/gkac439. PubMed DOI PMC
Hu G, Katuwawala A, Wang K, Wu Z, Ghadermarzi S, Gao J, Kurgan L. flDPnn: accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat Commun. 2021:12(1):4438. 10.1038/s41467-021-24773-7. PubMed DOI PMC
Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996:5(3):299–314. 10.2307/1390807. DOI
Jumper JM, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zídek A, Potapenko A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021:596(7873):583–589. 10.1038/s41586-021-03819-2. PubMed DOI PMC
Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983:22(12):2577–2637. 10.1002/bip.v22:12. PubMed DOI
Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J. 2018:285(14):2605–2625. 10.1111/febs.2018.285.issue-14. PubMed DOI
Kleppe AS, Bornberg-Bauer E. Robustness by intrinsically disordered C-termini and translational readthrough. Nucleic Acids Res. 2018:46(19):10184–10194. 10.1093/nar/gky778. PubMed DOI PMC
Krishna MMG, Englander SW. The N-terminal to C-terminal motif in protein folding and function. Proc Natl Acad Sci USA. 2005:102(4):1053. 10.1073/pnas.0409114102. PubMed DOI PMC
Lange A, Patel PH, Heames B, Damry AM, Saenger T, Jackson CJ, Findlay GD, Bornberg-Bauer E. Structural and functional characterization of a putative de novo gene in Drosophila. Nat Commun. 2020:12:1667. 10.1038/s41467-021-21667-6. PubMed DOI PMC
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009:25(14):1754–1760. 10.1093/bioinformatics/btp324. PubMed DOI PMC
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . The sequence alignment/map format and SAMtools. Bioinformatics. 2009:25(16):2078–2079. 10.1093/bioinformatics/btp352. PubMed DOI PMC
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023:379(6637):1123–1130. 10.1126/science.ade2574. PubMed DOI
Liu J, Yuan R, Shao W, Wang J, Silman I, Sussman JL. Do “Newly Born” orphan proteins resemble “Never Born” proteins? A study using three deep learning algorithms. Proteins: Struct Funct Bioinform. 2023:91(8):1097–1115. 10.1002/prot.v91.8. PubMed DOI
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014:15(12):550. 10.1186/s13059-014-0550-8. PubMed DOI PMC
Marsch-Martínez N, Reyes-Olalde JI, Chalfun-Junior A, Bemer M, Durán-Medina Y, Ochoa-Sánchez JC, Guerrero-Largo H, Herrera-Ubaldo H, Mes J, Chacón A, et al. Twisting development, the birth of a potential new gene. iScience. 2022:25(12):105627. 10.1016/j.isci.2022.105627. PubMed DOI PMC
Matsuo T, Nakatani K, Setoguchi T, Matsuo K, Tamada T, Suenaga Y. Secondary structure of human de novo evolved gene product NCYM analyzed by vacuum-ultraviolet circular dichroism. Front Oncol. 2021:11:688852. 10.3389/fonc.2021.688852. PubMed DOI PMC
McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet. 2016:17(9):567–578. 10.1038/nrg.2016.78. PubMed DOI
Middendorf L, Eicholt LA. Random, de novo, and conserved proteins: how structure and disorder predictors perform differently. Proteins. 2024:1–11. 10.1002/prot.26652. PubMed DOI
Montañés JC, Huertas M, Messeguer X, Albà MM. Evolutionary trajectories of new duplicated and putative de novo genes. Mol Biol Evol. 2023:40(5):msad098. 10.1093/molbev/msad098. PubMed DOI PMC
Monti M, Armaos A, Fantini M, Pastore A, Tartaglia GG. Aggregation is a context-dependent constraint on protein evolution. Front Mol Biosci. 2021:8:678115. 10.3389/fmolb.2021.678115. PubMed DOI PMC
Niwa T, Uemura E, Matsuno Y, Taguchi H. Translation-coupled protein folding assay using a protease to monitor the folding status. Protein Sci Publ Protein Soc. 2019:28(7):1252–1261. 10.1002/pro.v28.7. PubMed DOI PMC
Ohno S. Evolution by gene duplication. London: George Alien & Unwin Ltd; 1970.
Olexiouk V, Crappé J, Verbruggen S, Verhegen K, Martens L, Menschaert G. sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2016:44(D1):D324–D329. 10.1093/nar/gkv1175. PubMed DOI PMC
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun. 2024:15(1):810. 10.1038/s41467-024-45028-1 PubMed DOI PMC
Philipps B, Hennecke J, Glockshuber R. FRET-based in vivo screening for protein folding and increased protein stability. J Mol Biol. 2003:327(1):239–249. 10.1016/S0022-2836(03)00077-9. PubMed DOI
Pueyo JI, Magny EG, Couso JP. New peptides under the s(ORF)ace of the genome. Trends Biochem Sci. 2016:41(8):665–678. 10.1016/j.tibs.2016.05.003. PubMed DOI
Rödelsperger C, Prabh N, Sommer RJ. New gene origin and deep taxon phylogenomics: opportunities and challenges. Trends Genet. 2019:35(12):914–922. 10.1016/j.tig.2019.08.007. PubMed DOI
Sandmann C-L, Schulz JF, Ruiz-Orera J, Kirchner M, Ziehm M, Adami E, Marczenke M, Christ A, Liebe N, Greiner J, et al. Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames. Mol Cell. 2023:83(6):994–1011.e18. 10.1016/j.molcel.2023.01.023. PubMed DOI PMC
Schmitz JF, Chain FJJ, Bornberg-Bauer E. Evolution of novel genes in three-spined stickleback populations. Heredity. 2020:125(1-2):50–59. 10.1038/s41437-020-0319-7. PubMed DOI PMC
Schmitz JF, Ullrich KK, Bornberg-Bauer E. Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol. 2018:2(10):1626–1632. 10.1038/s41559-018-0639-7. PubMed DOI
Shen M-Y., Davis FP, Sali A. The optimal size of a globular protein domain: a simple sphere-packing model. Chem Phys Lett. 2005:405(1-3):224–228. 10.1016/j.cplett.2005.02.029. DOI
Sikosek T, Bornberg-Bauer E. In: Dittmar K, Liberles D, editors. Evolution After and Before Gene Duplication?. Wiley-Blackwell; 2010. p.105-–131.. 10.1002/9780470619902.ch6. DOI
Strait BJ, Dewey TG. The Shannon information entropy of protein sequences. Biophys J. 1996:71(1):148–155. 10.1016/S0006-3495(96)79210-X. PubMed DOI PMC
Tautz D, Domazet-Lošo T. The evolutionary origin of orphan genes. Nat Rev Genet. 2011:12(10):692–702. 10.1038/nrg3053. PubMed DOI
Tay JK, Narasimhan B, Hastie T. Elastic net regularization paths for all generalized linear models. J Stat Softw. 2023:106:1. 10.18637/jss.v106.i01. PubMed DOI PMC
Terwilliger TC, Liebschner D, Croll TI, Williams CJ, McCoy AJ, Poon BK, Afonine PV, Oeffner RD, Richardson JS, Read RJ, et al. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat Methods. 2023:21(1):110–116. 10.1038/s41592-023-02087-4. PubMed DOI PMC
Tretyachenko V, Vymětal J, Bednárová L, Kopecký V, Hofbauerová K, Jindrová H, Hubálek M, Souček R, Konvalinka J, Vondrášek J, et al. Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci Rep. 2017:7(1):15449. 10.1038/s41598-017-15635-8. PubMed DOI PMC
Uversky VN. The alphabet of intrinsic disorder. Intrinsically Disord Proteins. 2013:1(1):e24684. 10.4161/idp.24684. PubMed DOI PMC
Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep. 2022:41(12):111808. 10.1016/j.celrep.2022.111808. PubMed DOI PMC
Van Oss SB, Carvunis A-R. De novo gene birth. PLoS Genet. 2019:15(5):e1008160. 10.1371/journal.pgen.1008160. PubMed DOI PMC
van Rosmalen M, Krom M, Merkx M. Tuning the flexibility of glycine-serine linkers to allow rational design of multidomain proteins. Biochemistry. 2017:56(50):6565–6574. 10.1021/acs.biochem.7b00902. PubMed DOI PMC
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, SciPy 1.0 Contributors . et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020:17:261–272. 10.1038/s41592-019-0686-2. PubMed DOI PMC
Wang Z-Y, Leushkin E, Liechti A, Ovchinnikova S, Mößinger K, Brüning T, Rummel C, Grützner F, Cardoso-Moreira M, Janich P, et al. Transcriptome and translatome co-evolution in mammals. Nature. 2020:588(7839):642–647. 10.1038/s41586-020-2899-z. PubMed DOI PMC
Weisman CM, Eddy SR. Gene evolution: getting something from nothing. Curr Biol. 2017:27(13):R661–R663. 10.1016/j.cub.2017.05.056. PubMed DOI
Wilson BA, Foy SG, Neme R, Masel J. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 2017:1(6):0146. 10.1038/s41559-017-0146. PubMed DOI PMC
Wu B, Knudson A. Tracing the de novo origin of protein-coding genes in yeast. mBio. 2018:9(4):e01024-18. 10.1128/mBio.01024-18. PubMed DOI PMC
Zhang L, Ren Y, Yang T, Li G, Chen J, Gschwend AR, Yu Y, Hou G, Zi J, Zhou R. Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol. 2019:3(4):679–690. 10.1038/s41559-019-0822-5. PubMed DOI
Zhao L, Saelao P, Jones CD, Begun DJ. Origin and spread of de novo genes in Drosophila melanogaster populations. Science (New York, N.Y.). 2014:343(6172):769–772. 10.1126/science.1248286. PubMed DOI PMC
Zulkower V, Rosser S. DNA Chisel, a versatile sequence optimizer. Bioinformatics. 2020:36(16):4508–4509. 10.1093/bioinformatics/btaa558. PubMed DOI