De novo emergence, existence, and demise of a protein-coding gene in murids
Language English Country England, Great Britain Media electronic
Document type Journal Article
Grant support
647403
H2020 European Research Council
RVO 68378050-KAV-NPUI
Akademie Věd České Republiky
RVO 68378050
Akademie Věd České Republiky
LM2018129
Ministerstvo Školství, Mládeže a Tělovýchovy
CZ.02.1.01/0.0/0.0/18_046/0016045
Ministerstvo Školství, Mládeže a Tělovýchovy
LM2018126
Ministerstvo Školství, Mládeže a Tělovýchovy
e-INFRA CZ LM2018140
Ministerstvo Školství, Mládeže a Tělovýchovy
#KK.01.1.1.01.0010
European Structural and Investment Funds
#KK.01.1.1.01.0009
European Structural and Investment Funds
PubMed
36482406
PubMed Central
PMC9733328
DOI
10.1186/s12915-022-01470-5
PII: 10.1186/s12915-022-01470-5
Knihovny.cz E-resources
- Keywords
- CAG, D6Ertd527e, De novo, Evolution, Gene, LTR, Oocyte, Polyserine, Retrotransposon,
- MeSH
- Muridae * MeSH
- RNA, Long Noncoding * genetics MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- RNA, Long Noncoding * MeSH
BACKGROUND: Genes, principal units of genetic information, vary in complexity and evolutionary history. Less-complex genes (e.g., long non-coding RNA (lncRNA) expressing genes) readily emerge de novo from non-genic sequences and have high evolutionary turnover. Genesis of a gene may be facilitated by adoption of functional genic sequences from retrotransposon insertions. However, protein-coding sequences in extant genomes rarely lack any connection to an ancestral protein-coding sequence. RESULTS: We describe remarkable evolution of the murine gene D6Ertd527e and its orthologs in the rodent Muroidea superfamily. The D6Ertd527e emerged in a common ancestor of mice and hamsters most likely as a lncRNA-expressing gene. A major contributing factor was a long terminal repeat (LTR) retrotransposon insertion carrying an oocyte-specific promoter and a 5' terminal exon of the gene. The gene survived as an oocyte-specific lncRNA in several extant rodents while in some others the gene or its expression were lost. In the ancestral lineage of Mus musculus, the gene acquired protein-coding capacity where the bulk of the coding sequence formed through CAG (AGC) trinucleotide repeat expansion and duplications. These events generated a cytoplasmic serine-rich maternal protein. Knock-out of D6Ertd527e in mice has a small but detectable effect on fertility and the maternal transcriptome. CONCLUSIONS: While this evolving gene is not showing a clear function in laboratory mice, its documented evolutionary history in Muroidea during the last ~ 40 million years provides a textbook example of how a several common mutation events can support de novo gene formation, evolution of protein-coding capacity, as well as gene's demise.
See more in PubMed
Johannsen W. Elemente der exakten erblichkeitslehre. Deutsche wesentlich erweiterte ausgabe in fünfundzwanzig vorlesungen. Jena: G. Fischer; 1909. p. 534. https://www.archive.org/download/elementederexakt00joha/page/n4_w509.
Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J, Korbel JO, et al. What is a gene, post-ENCODE? History and updated definition. Genome Res. 2007;17(6):669–681. doi: 10.1101/gr.6339607. PubMed DOI
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. doi: 10.1186/1471-2105-4-41. PubMed DOI PMC
Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, et al. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res. 2013;41(Database issue):D348–52. PubMed PMC
Mushegian A. Gene content of LUCA, the last universal common ancestor. Front Biosci. 2008;13:4657–4666. doi: 10.2741/3031. PubMed DOI
Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, Ponting CP, et al. Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 2012;8(7):e1002841. doi: 10.1371/journal.pgen.1002841. PubMed DOI PMC
Hezroni H, Koppstein D, Schwartz MG, Avrutin A, Bartel DP, Ulitsky I. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 2015;11(7):1110–1122. doi: 10.1016/j.celrep.2015.04.023. PubMed DOI PMC
Kapusta A, Feschotte C. Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications. Trends Genet. 2014;30(10):439–452. doi: 10.1016/j.tig.2014.08.004. PubMed DOI PMC
Elisaphenko EA, Kolesnikov NN, Shevchenko AI, Rogozin IB, Nesterova TB, Brockdorff N, et al. A dual origin of the Xist gene from a protein-coding gene and a set of transposable elements. PLoS ONE. 2008;3(6):e2521. doi: 10.1371/journal.pone.0002521. PubMed DOI PMC
Hezroni H, Ben-Tov Perry R, Meir Z, Housman G, Lubelsky Y, Ulitsky I. A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes. Genome Biol. 2017;18(1):162. doi: 10.1186/s13059-017-1293-0. PubMed DOI PMC
Housman G, Ulitsky I. Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs. Biochim Biophys Acta. 2016;1859(1):31–40. doi: 10.1016/j.bbagrm.2015.07.017. PubMed DOI
Franke V, Ganesh S, Karlic R, Malik R, Pasulka J, Horvat F, et al. Long terminal repeats power evolution of genes and gene expression programs in mammalian oocytes and zygotes. Genome Res. 2017;27(8):1384–1394. doi: 10.1101/gr.216150.116. PubMed DOI PMC
Van Oss SB, Carvunis AR. De novo gene birth. PLoS Genet. 2019;15(5):e1008160. doi: 10.1371/journal.pgen.1008160. PubMed DOI PMC
Yona AH, Alm EJ, Gore J. Random sequences rapidly evolve into de novo promoters. Nat Commun. 2018;9(1):1530. doi: 10.1038/s41467-018-04026-w. PubMed DOI PMC
Gerdes P, Richardson SR, Mager DL, Faulkner GJ. Transposable elements in the mammalian embryo: pioneers surviving through stealth and service. Genome Biol. 2016;17:100. doi: 10.1186/s13059-016-0965-5. PubMed DOI PMC
de Souza FS, Franchini LF, Rubinstein M. Exaptation of transposable elements into novel cis-regulatory elements: is the evidence always strong? Mol Biol Evol. 2013;30(6):1239–1251. doi: 10.1093/molbev/mst045. PubMed DOI PMC
Goke J, Ng HH. CTRL+INSERT: retrotransposons and their contribution to regulation and innovation of the transcriptome. EMBO Rep. 2016;17(8):1131–1144. doi: 10.15252/embr.201642743. PubMed DOI PMC
Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, Bourque G, et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 2013;9(4):e1003470. doi: 10.1371/journal.pgen.1003470. PubMed DOI PMC
Ganesh S, Svoboda P. Retrotransposon-associated long non-coding RNAs in mice and men. Pflugers Arch. 2016;468(6):1049–1060. doi: 10.1007/s00424-016-1818-5. PubMed DOI
Brosius J, Gould SJ. On “genomenclature”: a comprehensive (and respectful) taxonomy for pseudogenes and other “junk DNA.” Proc Natl Acad Sci U S A. 1992;89(22):10706–10. PubMed PMC
Peaston AE, Evsikov AV, Graber JH, de Vries WN, Holbrook AE, Solter D, et al. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell. 2004;7(4):597–606. doi: 10.1016/j.devcel.2004.09.004. PubMed DOI
Thompson PJ, Macfarlan TS, Lorincz MC. Long terminal repeats: from parasitic elements to building blocks of the transcriptional regulatory repertoire. Mol Cell. 2016;62(5):766–776. doi: 10.1016/j.molcel.2016.03.029. PubMed DOI PMC
Smit AF. Identification of a new, abundant superfamily of mammalian LTR-transposons. Nucleic Acids Res. 1993;21(8):1863–1872. doi: 10.1093/nar/21.8.1863. PubMed DOI PMC
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420(6915):520–562. doi: 10.1038/nature01262. PubMed DOI
Flemr M, Malik R, Franke V, Nejepinska J, Sedlacek R, Vlahovicek K, et al. A retrotransposon-driven dicer isoform directs endogenous small interfering RNA production in mouse oocytes. Cell. 2013;155(4):807–816. doi: 10.1016/j.cell.2013.10.001. PubMed DOI
Piao Y, Ko NT, Lim MK, Ko MS. Construction of long-transcript enriched cDNA libraries from submicrogram amounts of total RNAs by a universal PCR amplification method. Genome Res. 2001;11(9):1553–1558. doi: 10.1101/gr.185501. PubMed DOI PMC
Horvat F, Fulka H, Jankele R, Malik R, Jun M, Solcova K, et al. Role of Cnot6l in maternal mRNA turnover. Life Sci Alliance. 2018;1(4):e201800084. doi: 10.26508/lsa.201800084. PubMed DOI PMC
Kumar S, Hedges SB. A molecular timescale for vertebrate evolution. Nature. 1998;392(6679):917–920. doi: 10.1038/31927. PubMed DOI
Kumar S, Suleski M, Craig JM, Kasprowicz AE, Sanderford M, Li M, et al. TimeTree 5: an expanded resource for species divergence times. Mol Biol Evol. 2022;39(8):msac174. 10.1093/molbev/msac174 PubMed PMC
Steppan SJ, Schenk JJ. Muroid rodent phylogenetics: 900-species tree reveals increasing diversification rates. PLoS ONE. 2017;12(8):e0183070. doi: 10.1371/journal.pone.0183070. PubMed DOI PMC
Abe K, Yamamoto R, Franke V, Cao M, Suzuki Y, Suzuki MG, et al. The first murine zygotic transcription is promiscuous and uncoupled from splicing and 3’ processing. EMBO J. 2015;34(11):1523–37. PubMed PMC
Gahurova L, Tomizawa SI, Smallwood SA, Stewart-Morgan KR, Saadeh H, Kim J, et al. Transcription and chromatin determinants of de novo DNA methylation timing in oocytes. Epigenetics Chromatin. 2017;10:25. doi: 10.1186/s13072-017-0133-5. PubMed DOI PMC
Zhang H, Zhang F, Chen Q, Li M, Lv X, Xiao Y, et al. The piRNA pathway is essential for generating functional oocytes in golden hamsters. Nat Cell Biol. 2021;23(9):1013–1022. doi: 10.1038/s41556-021-00750-6. PubMed DOI
Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature. 2014;515(7527):355–364. doi: 10.1038/nature13992. PubMed DOI PMC
Ganesh S, Horvat F, Drutovic D, Efenberkova M, Pinkas D, Jindrova A, et al. The most abundant maternal lncRNA Sirena1 acts post-transcriptionally and impacts mitochondrial distribution. Nucleic Acids Res. 2020;48(6):3211–3227. doi: 10.1093/nar/gkz1239. PubMed DOI PMC
Mamrot J, Gardner DK, Temple-Smith P, Dickinson H. Embryonic gene transcription in the spiny mouse (Acomys cahirinus): an investigtion into the embryonic genome activation. bioRxiv. 2018:280412. 10.1101/280412.
Wang L, Park HJ, Dasari S, Wang S, Kocher JP, Li W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013;41(6):e74. doi: 10.1093/nar/gkt006. PubMed DOI PMC
Simon MM, Greenaway S, White JK, Fuchs H, Gailus-Durner V, Wells S, et al. A comparative phenotypic and genomic analysis of C57BL/6J and C57BL/6N mouse strains. Genome Biol. 2013;14(7):R82. doi: 10.1186/gb-2013-14-7-r82. PubMed DOI PMC
Wang S, Kou Z, Jing Z, Zhang Y, Guo X, Dong M, et al. Proteome of mouse oocytes at different developmental stages. Proc Natl Acad Sci U S A. 2010;107(41):17639–17644. doi: 10.1073/pnas.1013185107. PubMed DOI PMC
Pfeiffer MJ, Siatkowski M, Paudel Y, Balbach ST, Baeumer N, Crosetto N, et al. Proteomic analysis of mouse oocytes reveals 28 candidate factors of the “reprogrammome.” J Proteome Res. 2011;10(5):2140–53. PubMed
Wang B, Pfeiffer MJ, Drexler HC, Fuellen G, Boiani M. Proteomic analysis of mouse oocytes identifies PRMT7 as a reprogramming factor that replaces SOX2 in the induction of pluripotent stem cells. J Proteome Res. 2016;15(8):2407–21. PubMed
Israel S, Ernst M, Psathaki OE, Drexler HCA, Casser E, Suzuki Y, et al. An integrated genome-wide multi-omics analysis of gene expression dynamics in the preimplantation mouse embryo. Sci Rep. 2019;9(1):13356. doi: 10.1038/s41598-019-49817-3. PubMed DOI PMC
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–589. doi: 10.1038/s41586-021-03819-2. PubMed DOI PMC
Weber EM, Algers B, Wurbel H, Hultgren J, Olsson IA. Influence of strain and parity on the risk of litter loss in laboratory mice. Reprod Domest Anim. 2013;48(2):292–296. doi: 10.1111/j.1439-0531.2012.02147.x. PubMed DOI
Karlic R, Ganesh S, Franke V, Svobodova E, Urbanova J, Suzuki Y, et al. Long non-coding RNA exchange during the oocyte-to-embryo transition in mice. DNA Res. 2017;24(2):129–141. doi: 10.1093/dnares/dsx008. PubMed DOI PMC
Sicinski P, Donaher JL, Geng Y, Parker SB, Gardner H, Park MY, et al. Cyclin D2 is an FSH-responsive gene involved in gonadal cell proliferation and oncogenesis. Nature. 1996;384(6608):470–474. doi: 10.1038/384470a0. PubMed DOI
Long AD, Baldwin-Brown J, Tao Y, Cook VJ, Balderrama-Gutierrez G, Corbett-Detig R, et al. The genome of Peromyscus leucopus, natural host for Lyme disease and other emerging infections. Sci Adv. 2019;5(7):eaaw6441. doi: 10.1126/sciadv.aaw6441. PubMed DOI PMC
Harringmeyer OS, Hoekstra HE. Chromosomal inversion polymorphisms shape the genomic landscape of deer mice. Nat Ecol Evol. 2022:1–15. 10.1038/s41559-022-01890-0. PubMed PMC
Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ, Jackson SE, et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 2014;8(5):1365–1379. doi: 10.1016/j.celrep.2014.07.045. PubMed DOI PMC
Ji Z, Song R, Regev A, Struhl K. Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins. Elife. 2015;4:e08890. PubMed PMC
Kim JC, Mirkin SM. The balancing act of DNA repeat expansions. Curr Opin Genet Dev. 2013;23(3):280–288. doi: 10.1016/j.gde.2013.04.009. PubMed DOI PMC
Mier P, Alanis-Lobato G, Andrade-Navarro MA. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins. 2017;85(4):709–719. doi: 10.1002/prot.25250. PubMed DOI
Chavali S, Singh AK, Santhanam B, Babu MM. Amino acid homorepeats in proteins. Nat Rev Chem. 2020;4(8):420–434. doi: 10.1038/s41570-020-0204-1. PubMed DOI
Shao J, Diamond MI. Polyglutamine diseases: emerging concepts in pathogenesis and therapy. Hum Mol Genet. 2007;16 Spec No. 2:R115–23. doi: 10.1093/hmg/ddm213. PubMed DOI
Chen L, DeVries AL, Cheng CH. Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish. Proc Natl Acad Sci U S A. 1997;94(8):3811–3816. doi: 10.1073/pnas.94.8.3811. PubMed DOI PMC
Chen L, DeVries AL, Cheng CH. Convergent evolution of antifreeze glycoproteins in Antarctic notothenioid fish and Arctic cod. Proc Natl Acad Sci U S A. 1997;94(8):3817–3822. doi: 10.1073/pnas.94.8.3817. PubMed DOI PMC
Carducci F, Biscotti MA, Canapa A. Vitellogenin gene family in vertebrates: evolution and functions. Eur Zoological J. 2019;86(1):233–240. doi: 10.1080/24750263.2019.1631398. DOI
Sun C, Zhang S. Immune-relevant and antioxidant activities of vitellogenin and yolk proteins in fish. Nutrients. 2015;7(10):8818–8829. doi: 10.3390/nu7105432. PubMed DOI PMC
Li H, Zhang S. Functions of vitellogenin in eggs. Results Probl Cell Differ. 2017;63:389–401. doi: 10.1007/978-3-319-60855-6_17. PubMed DOI
Taborsky G. Phosvitin. Adv Inorg Biochem. 1983;5:235–279. PubMed
Finn RN. Vertebrate yolk complexes and the functional implications of phosvitins and other subdomains in vitellogenins. Biol Reprod. 2007;76(6):926–935. doi: 10.1095/biolreprod.106.059766. PubMed DOI
Ishikawa S, Yano Y, Arihara K, Itoh M. Egg yolk phosvitin inhibits hydroxyl radical formation from the fenton reaction. Biosci Biotechnol Biochem. 2004;68(6):1324–1331. doi: 10.1271/bbb.68.1324. PubMed DOI
Brawand D, Wahli W, Kaessmann H. Loss of egg yolk genes in mammals and the origin of lactation and placentation. PLoS Biol. 2008;6(3):e63. doi: 10.1371/journal.pbio.0060063. PubMed DOI PMC
Long M, Betran E, Thornton K, Wang W. The origin of new genes: glimpses from the young and old. Nat Rev Genet. 2003;4(11):865–875. doi: 10.1038/nrg1204. PubMed DOI
McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet. 2016;17(9):567–578. doi: 10.1038/nrg.2016.78. PubMed DOI
Nagy A. In: Manipulating the mouse embryo : a laboratory manual. 3rd ed. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press; 2003. p. x, 764.
Kataruka S, Modrak M, Kinterova V, Malik R, Zeitler DM, Horvat F, et al. MicroRNA dilution during oocyte growth disables the microRNA pathway in mammalian oocytes. Nucleic Acids Res. 2020;48(14):8050–8062. doi: 10.1093/nar/gkaa543. PubMed DOI PMC
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. doi: 10.1093/bioinformatics/bts635. PubMed DOI PMC
Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26(17):2204–2207. doi: 10.1093/bioinformatics/btq351. PubMed DOI PMC
R Core Team . R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2013.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8. PubMed DOI PMC
Horvat F. De novo emergence, existence, and demise of a protein-coding gene in murids. NCBI GEO accession GSE213820. 2022. PubMed PMC