High-throughput Selection of Human de novo-emerged sORFs with High Folding Potential

. 2024 Apr 02 ; 16 (4) : .

Jazyk angličtina Země Anglie, Velká Británie Médium print

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid38597156

Grantová podpora
98183 Volkswagen Foundation
HFSP - RGP004/2023 HFSP
Charles University
722610 Horizon 2020 Research and Innovation Framework Programme
Erasmus+

De novo genes emerge from previously noncoding stretches of the genome. Their encoded de novo proteins are generally expected to be similar to random sequences and, accordingly, with no stable tertiary fold and high predicted disorder. However, structural properties of de novo proteins and whether they differ during the stages of emergence and fixation have not been studied in depth and rely heavily on predictions. Here we generated a library of short human putative de novo proteins of varying lengths and ages and sorted the candidates according to their structural compactness and disorder propensity. Using Förster resonance energy transfer combined with Fluorescence-activated cell sorting, we were able to screen the library for most compact protein structures, as well as most elongated and flexible structures. We find that compact de novo proteins are on average slightly shorter and contain lower predicted disorder than less compact ones. The predicted structures for most and least compact de novo proteins correspond to expectations in that they contain more secondary structure content or higher disorder content, respectively. Our experiments indicate that older de novo proteins have higher compactness and structural propensity compared with young ones. We discuss possible evolutionary scenarios and their implications underlying the age-dependencies of compactness and structural content of putative de novo proteins.

Zobrazit více v PubMed

Agozzino L, Dill KA. Protein evolution speed depends on its stability and abundance and on chaperone concentrations. Proc Natl Acad Sci USA. 2018:115(37):9092–9097. 10.1073/pnas.1810194115. PubMed DOI PMC

Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015:31(2):166–169. 10.1093/bioinformatics/btu638. PubMed DOI PMC

Ángyán AF, Perczel A, Gáspári Z. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 2012:586(16):2468–2472. 10.1016/j.febslet.2012.06.007. PubMed DOI

Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res. 2023:12:347. 10.12688/f1000research. PubMed DOI PMC

Bajar BT, Wang ES, Zhang S, Lin MZ, Chu J. A guide to fluorescent protein FRET pairs. Sensors (Basel). 2016:16(9):1488. 10.3390/s16091488. PubMed DOI PMC

Banning C, Votteler J, Hoffmann D, Koppensteiner H, Warmer M, Reimer R, Kirchhoff F, Schubert U, Hauber J, Schindler M, et al. A flow cytometry-based FRET assay to identify and analyse protein-protein interactions in living cells. PLoS ONE. 2010:5(2):e9344. 10.1371/journal.pone.0009344. PubMed DOI PMC

Blevins WR, Ruiz-Orera J, Messeguer X, Blasco-Moreno B, Villanueva-Cañas JL, Espinar L, Díez J, Carey LB, Albà MM. Uncovering de novo gene birth in yeast using deep transcriptomics. Nat Commun. 2021:12(1):604. 10.1038/s41467-021-20911-3. PubMed DOI PMC

Bornberg-Bauer E, Hlouchova K, Lange A. Structure and function of naturally evolved de novo proteins. Curr Opin Struct Biol. 2021:68:175–183. 10.1016/j.sbi.2020.11.010. PubMed DOI

Broeils LA, Ruiz-Orera J, Snel B, Hubner N, van Heesch S. Evolution and implications of de novo genes in humans. Nat Ecol Evol. 2023:7(6):804–815. 10.1038/s41559-023-02014-y. PubMed DOI

Bungard D, Copple JS, Yan J, Chhun JJ, Kumirov VK, Foy SG, Masel J, Wysocki VH, Cordes MHJ. Foldability of a natural de novo evolved protein. Structure. 2017:25(11):1687–1696.e4. 10.1016/j.str.2017.09.006. PubMed DOI PMC

Chen J, Li Q, Xia S, Arsala D, Sosa D, Wang D, Long M. One million years of solitude: the rapid evolution of de novo protein structure and complex. 2023. 10.1101/2023.12.24.573215. PubMed DOI PMC

Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018:34(17):i884–i890. 10.1093/bioinformatics/bty560. PubMed DOI PMC

Dohmen E, Klasberg S, Bornberg-Bauer E, Perrey S, Kemena C. The modular nature of protein evolution: domain rearrangement rates across eukaryotic life. BMC Evol Biol. 2020:20(1):30. 10.1186/s12862-020-1591-0. PubMed DOI PMC

Dowling D, Schmitz JF, Bornberg-Bauer E, Aoife M. Stochastic gain and loss of novel transcribed open reading frames in the human lineage. Genome Biol Evol. 2020:12(11):2183–2195. 10.1093/gbe/evaa194. PubMed DOI PMC

Elofsson A. Progress at protein structure prediction, as seen in CASP15. Curr Opin Struct Biol. 2023:80:102594. 10.1016/j.sbi.2023.102594. PubMed DOI

EMBL-EBI . Uniprotkb/trembl protein database release

Förster T. Zwischenmolekulare energiewanderung und fluoreszenz. Ann Phys. 1948:437(1-2):55–75. 10.1002/andp.19484370105. DOI

Goedhart J, von Stetten D, Noirclerc-Savoye M, Lelimousin M, Joosen L, Hink MA, van Weeren L, Gadella TWJ, Royant A. Structure-guided evolution of cyan fluorescent proteins towards a quantum yield of 93%. Nat Commun. 2012:3:751. 10.1038/ncomms1738. PubMed DOI PMC

Grandchamp A, Kühl L, Lebherz M, Brüggemann K, Parsch J, Bornberg-Bauer E. Population genomics reveals mechanisms and dynamics of de novo expressed open reading frame emergence in Drosophila melanogaster. Genome Res. 2023:33(6):872–890. 10.1101/gr.277482.122. PubMed DOI PMC

Gubala AM, Schmitz JF, Kearns MJ, Vinh TT, Bornberg-Bauer E, Wolfner MF, Findlay GD. The goddard and saturn genes are essential for Drosophila male fertility and may have arisen de novo. Mol Biol Evol. 2017:34(5):1066–1082. 10.1093/molbev/msx057. PubMed DOI PMC

Guerzoni D, McLysaght A. De novo genes arise at a slow but steady rate along the primate lineage and have been subject to incomplete lineage sorting. Genome Biol Evol. 2016:8(4):1222–1232. 10.1093/gbe/evw074. PubMed DOI PMC

Heames B, Buchel F, Aubel M, Tretyachenko V, Loginov D, Novák P, Lange A, Bornberg-Bauer E, Hlouchová K. Experimental characterization of de novo proteins and their unevolved random-sequence counterparts. Nat Ecol Evol. 2023:7(4):570–580. 10.1038/s41559-023-02010-2. PubMed DOI PMC

Heames B, Schmitz J, Bornberg-Bauer E. A continuum of evolving de novo genes drives protein-coding novelty in Drosophila. J Mol Evol. 2020:88(4):382–398. 10.1007/s00239-020-09939-z. PubMed DOI PMC

Her C, Yeh Y, Krishnan VV. The ensemble of conformations of antifreeze glycoproteins (AFGP8): a study using nuclear magnetic resonance spectroscopy. Biomolecules. 2019:9(6):235. 10.3390/biom9060235. PubMed DOI PMC

Høie MH, Kiehl EN, Petersen B, Nielsen M, Winther O, Nielsen H, Hallgren J, Marcatili P. NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning. Nucleic Acids Res. 2022:50:W510–W515. 10.1093/nar/gkac439. PubMed DOI PMC

Hu G, Katuwawala A, Wang K, Wu Z, Ghadermarzi S, Gao J, Kurgan L. flDPnn: accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat Commun. 2021:12(1):4438. 10.1038/s41467-021-24773-7. PubMed DOI PMC

Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996:5(3):299–314. 10.2307/1390807. DOI

Jumper JM, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zídek A, Potapenko A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021:596(7873):583–589. 10.1038/s41586-021-03819-2. PubMed DOI PMC

Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983:22(12):2577–2637. 10.1002/bip.v22:12. PubMed DOI

Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J. 2018:285(14):2605–2625. 10.1111/febs.2018.285.issue-14. PubMed DOI

Kleppe AS, Bornberg-Bauer E. Robustness by intrinsically disordered C-termini and translational readthrough. Nucleic Acids Res. 2018:46(19):10184–10194. 10.1093/nar/gky778. PubMed DOI PMC

Krishna MMG, Englander SW. The N-terminal to C-terminal motif in protein folding and function. Proc Natl Acad Sci USA. 2005:102(4):1053. 10.1073/pnas.0409114102. PubMed DOI PMC

Lange A, Patel PH, Heames B, Damry AM, Saenger T, Jackson CJ, Findlay GD, Bornberg-Bauer E. Structural and functional characterization of a putative de novo gene in Drosophila. Nat Commun. 2020:12:1667. 10.1038/s41467-021-21667-6. PubMed DOI PMC

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009:25(14):1754–1760. 10.1093/bioinformatics/btp324. PubMed DOI PMC

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . The sequence alignment/map format and SAMtools. Bioinformatics. 2009:25(16):2078–2079. 10.1093/bioinformatics/btp352. PubMed DOI PMC

Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023:379(6637):1123–1130. 10.1126/science.ade2574. PubMed DOI

Liu J, Yuan R, Shao W, Wang J, Silman I, Sussman JL. Do “Newly Born” orphan proteins resemble “Never Born” proteins? A study using three deep learning algorithms. Proteins: Struct Funct Bioinform. 2023:91(8):1097–1115. 10.1002/prot.v91.8. PubMed DOI

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014:15(12):550. 10.1186/s13059-014-0550-8. PubMed DOI PMC

Marsch-Martínez N, Reyes-Olalde JI, Chalfun-Junior A, Bemer M, Durán-Medina Y, Ochoa-Sánchez JC, Guerrero-Largo H, Herrera-Ubaldo H, Mes J, Chacón A, et al. Twisting development, the birth of a potential new gene. iScience. 2022:25(12):105627. 10.1016/j.isci.2022.105627. PubMed DOI PMC

Matsuo T, Nakatani K, Setoguchi T, Matsuo K, Tamada T, Suenaga Y. Secondary structure of human de novo evolved gene product NCYM analyzed by vacuum-ultraviolet circular dichroism. Front Oncol. 2021:11:688852. 10.3389/fonc.2021.688852. PubMed DOI PMC

McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet. 2016:17(9):567–578. 10.1038/nrg.2016.78. PubMed DOI

Middendorf L, Eicholt LA. Random, de novo, and conserved proteins: how structure and disorder predictors perform differently. Proteins. 2024:1–11. 10.1002/prot.26652. PubMed DOI

Montañés JC, Huertas M, Messeguer X, Albà MM. Evolutionary trajectories of new duplicated and putative de novo genes. Mol Biol Evol. 2023:40(5):msad098. 10.1093/molbev/msad098. PubMed DOI PMC

Monti M, Armaos A, Fantini M, Pastore A, Tartaglia GG. Aggregation is a context-dependent constraint on protein evolution. Front Mol Biosci. 2021:8:678115. 10.3389/fmolb.2021.678115. PubMed DOI PMC

Niwa T, Uemura E, Matsuno Y, Taguchi H. Translation-coupled protein folding assay using a protease to monitor the folding status. Protein Sci Publ Protein Soc. 2019:28(7):1252–1261. 10.1002/pro.v28.7. PubMed DOI PMC

Ohno S. Evolution by gene duplication. London: George Alien & Unwin Ltd; 1970.

Olexiouk V, Crappé J, Verbruggen S, Verhegen K, Martens L, Menschaert G. sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2016:44(D1):D324–D329. 10.1093/nar/gkv1175. PubMed DOI PMC

Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun. 2024:15(1):810. 10.1038/s41467-024-45028-1 PubMed DOI PMC

Philipps B, Hennecke J, Glockshuber R. FRET-based in vivo screening for protein folding and increased protein stability. J Mol Biol. 2003:327(1):239–249. 10.1016/S0022-2836(03)00077-9. PubMed DOI

Pueyo JI, Magny EG, Couso JP. New peptides under the s(ORF)ace of the genome. Trends Biochem Sci. 2016:41(8):665–678. 10.1016/j.tibs.2016.05.003. PubMed DOI

Rödelsperger C, Prabh N, Sommer RJ. New gene origin and deep taxon phylogenomics: opportunities and challenges. Trends Genet. 2019:35(12):914–922. 10.1016/j.tig.2019.08.007. PubMed DOI

Sandmann C-L, Schulz JF, Ruiz-Orera J, Kirchner M, Ziehm M, Adami E, Marczenke M, Christ A, Liebe N, Greiner J, et al. Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames. Mol Cell. 2023:83(6):994–1011.e18. 10.1016/j.molcel.2023.01.023. PubMed DOI PMC

Schmitz JF, Chain FJJ, Bornberg-Bauer E. Evolution of novel genes in three-spined stickleback populations. Heredity. 2020:125(1-2):50–59. 10.1038/s41437-020-0319-7. PubMed DOI PMC

Schmitz JF, Ullrich KK, Bornberg-Bauer E. Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol. 2018:2(10):1626–1632. 10.1038/s41559-018-0639-7. PubMed DOI

Shen M-Y., Davis FP, Sali A. The optimal size of a globular protein domain: a simple sphere-packing model. Chem Phys Lett. 2005:405(1-3):224–228. 10.1016/j.cplett.2005.02.029. DOI

Sikosek T, Bornberg-Bauer E. In: Dittmar K, Liberles D, editors. Evolution After and Before Gene Duplication?. Wiley-Blackwell; 2010. p.105-–131.. 10.1002/9780470619902.ch6. DOI

Strait BJ, Dewey TG. The Shannon information entropy of protein sequences. Biophys J. 1996:71(1):148–155. 10.1016/S0006-3495(96)79210-X. PubMed DOI PMC

Tautz D, Domazet-Lošo T. The evolutionary origin of orphan genes. Nat Rev Genet. 2011:12(10):692–702. 10.1038/nrg3053. PubMed DOI

Tay JK, Narasimhan B, Hastie T. Elastic net regularization paths for all generalized linear models. J Stat Softw. 2023:106:1. 10.18637/jss.v106.i01. PubMed DOI PMC

Terwilliger TC, Liebschner D, Croll TI, Williams CJ, McCoy AJ, Poon BK, Afonine PV, Oeffner RD, Richardson JS, Read RJ, et al. AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination. Nat Methods. 2023:21(1):110–116. 10.1038/s41592-023-02087-4. PubMed DOI PMC

Tretyachenko V, Vymětal J, Bednárová L, Kopecký V, Hofbauerová K, Jindrová H, Hubálek M, Souček R, Konvalinka J, Vondrášek J, et al. Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci Rep. 2017:7(1):15449. 10.1038/s41598-017-15635-8. PubMed DOI PMC

Uversky VN. The alphabet of intrinsic disorder. Intrinsically Disord Proteins. 2013:1(1):e24684. 10.4161/idp.24684. PubMed DOI PMC

Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep. 2022:41(12):111808. 10.1016/j.celrep.2022.111808. PubMed DOI PMC

Van Oss SB, Carvunis A-R. De novo gene birth. PLoS Genet. 2019:15(5):e1008160. 10.1371/journal.pgen.1008160. PubMed DOI PMC

van Rosmalen M, Krom M, Merkx M. Tuning the flexibility of glycine-serine linkers to allow rational design of multidomain proteins. Biochemistry. 2017:56(50):6565–6574. 10.1021/acs.biochem.7b00902. PubMed DOI PMC

Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, SciPy 1.0 Contributors . et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020:17:261–272. 10.1038/s41592-019-0686-2. PubMed DOI PMC

Wang Z-Y, Leushkin E, Liechti A, Ovchinnikova S, Mößinger K, Brüning T, Rummel C, Grützner F, Cardoso-Moreira M, Janich P, et al. Transcriptome and translatome co-evolution in mammals. Nature. 2020:588(7839):642–647. 10.1038/s41586-020-2899-z. PubMed DOI PMC

Weisman CM, Eddy SR. Gene evolution: getting something from nothing. Curr Biol. 2017:27(13):R661–R663. 10.1016/j.cub.2017.05.056. PubMed DOI

Wilson BA, Foy SG, Neme R, Masel J. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 2017:1(6):0146. 10.1038/s41559-017-0146. PubMed DOI PMC

Wu B, Knudson A. Tracing the de novo origin of protein-coding genes in yeast. mBio. 2018:9(4):e01024-18. 10.1128/mBio.01024-18. PubMed DOI PMC

Zhang L, Ren Y, Yang T, Li G, Chen J, Gschwend AR, Yu Y, Hou G, Zi J, Zhou R. Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol. 2019:3(4):679–690. 10.1038/s41559-019-0822-5. PubMed DOI

Zhao L, Saelao P, Jones CD, Begun DJ. Origin and spread of de novo genes in Drosophila melanogaster populations. Science (New York, N.Y.). 2014:343(6172):769–772. 10.1126/science.1248286. PubMed DOI PMC

Zulkower V, Rosser S. DNA Chisel, a versatile sequence optimizer. Bioinformatics. 2020:36(16):4508–4509. 10.1093/bioinformatics/btaa558. PubMed DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...