• This record comes from PubMed

Largest genome assembly in Brassicaceae: retrotransposon-driven genome expansion and karyotype evolution in Matthiola incana

. 2025 Sep ; 23 (9) : 4109-4125. [epub] 20250626

Language English Country Great Britain, England Media print-electronic

Document type Journal Article

Grant support
2021YFD1600500 National Key Research and Development Program of China
32160454;32260469 National Natural Science Foundation of China
20212BAB215002 Natural Science Foundation of Jiangxi Province
2022021302024851 Wuhan Science and Technology Major Project on Key techniques of biological breeding and Breeding of new varieties
25-16142S Czech Science Foundation
The project TowArds Next GENeration Crops (CZ.02.01.01/00/22_008/0004581) of the ERDF Programme Johannes Amos Comenius

Matthiola incana, commonly known as stock and gillyflower, is a widely grown ornamental plant whose genome is significantly larger than that of other species in the mustard family. However, the evolutionary history behind such a large genome (~2 Gb) is still unknown. Here, we have succeeded in obtaining a high-quality chromosome-scale genome assembly of M. incana by integrating PacBio HiFi reads, Illumina short reads and Hi-C data. The resulting genome consists of seven pseudochromosomes with a length of 1965 Mb and 38 245 gene models. Phylogenetic analysis indicates that M. incana and other taxa of the supertribe Hesperodae represent an early-diverging lineage in the evolutionary history of the Brassicaceae. Through a comparative analysis, we revisited the ancestral Hesperodae karyotype (AHK, n = 7) and found several differences from the well-established ancestral crucifer karyotype (ACK, n = 8) model, including extensive inter- and intra-chromosomal rearrangements. Our results suggest that the primary reason for genome obesity in M. incana is the massive expansion of long terminal repeat retrotransposons (LTR-RTs), particularly from the Angela, Athila and Retand families. CHG methylation modification is obviously reduced in the regions where the highest density of Copia-type LTR-RTs and the lowest density of Gypsy-type LTR-RTs overlap, corresponding to the putative centromeres. Based on insertion times and methylation profiling, recently inserted LTR-RTs were found to have a significantly different methylation pattern compared to older ones.

See more in PubMed

Allen, G.C. , Flores‐Vergara, M.A. , Krasynanski, S. , Kumar, S. and Thompson, W.F. (2006) A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 1, 2320–2325. PubMed

Anderson, S.N. , Stitzer, M.C. , Brohammer, A.B. , Zhou, P. , Noshay, J.M. , O'Connor, C.H. , Hirsch, C.D. PubMed

Bailey, C.D. , Koch, M.A. , Mayer, M. , Mummenhoff, K. , O'Kane, S.L., Jr. , Warwick, S.I. , Windham, M.D. PubMed

Beilstein, M.A. , Al‐Shehbaz, I.A. and Kellogg, E.A. (2006) Brassicaceae phylogeny and trichome evolution. Am. J. Bot. 93, 607–619. PubMed

Beilstein, M.A. , Al‐Shehbaz, I.A. , Mathews, S. and Kellogg, E.A. (2008) Brassicaceae phylogeny inferred from phytochrome A and ndhF sequence data: tribes and trichomes revisited. Am. J. Bot. 95, 1307–1327. PubMed

Bennetzen, J.L. and Wang, H. (2014) The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu. Rev. Plant Biol. 65, 505–530. PubMed

Bolger, A.M. , Lohse, M. and Usadel, B. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. PubMed PMC

Buchfink, B. , Xie, C. and Huson, D.H. (2015) Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60. PubMed

Chen, J. , Wang, Z. , Tan, K. , Huang, W. , Shi, J. , Li, T. , Hu, J. PubMed PMC

Cheng, F. , Liang, J. , Cai, C. , Cai, X. , Wu, J. and Wang, X. (2017) Genome sequencing supports a multi‐vertex model for Brassiceae species. Curr. Opin. Plant Biol. 36, 79–87. PubMed

Cheng, F. , Mandáková, T. , Wu, J. , Xie, Q. , Lysak, M.A. and Wang, X. (2013) Deciphering the diploid ancestral genome of the mesohexaploid PubMed PMC

Cheng, H. , Concepcion, G.T. , Feng, X. , Zhang, H. and Li, H. (2021) Haplotype‐resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175. PubMed PMC

Chernomor, O. , von Haeseler, A. and Minh, B.Q. (2016) Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65, 997–1008. PubMed PMC

Duan, L. , Fu, L. and Chen, H.F. (2023) Phylogenomic cytonuclear discordance and evolutionary histories of plants and animals. Sci. China Life Sci. 66, 2946–2948. PubMed

Dudchenko, O. , Batra, S.S. , Omer, A.D. , Nyquist, S.K. , Hoeger, M. , Durand, N.C. , Shamim, M.S. PubMed PMC

Durand, N.C. , Robinson, J.T. , Shamim, M.S. , Machol, I. , Mesirov, J.P. , Lander, E.S. and Aiden, E.L. (2016) Juicebox provides a visualization system for Hi‐C contact maps with unlimited zoom. Cell Systems. 3, 99–101. PubMed PMC

Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113. PubMed PMC

Ellinghaus, D. , Kurtz, S. and Willhoeft, U. (2008) LTRharvest, an efficient and flexible software for PubMed PMC

Emms, D.M. and Kelly, S. (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238. PubMed PMC

Franzke, A. , Lysak, M.A. , Al‐Shehbaz, I.A. , Koch, M.A. and Mummenhoff, K. (2011) Cabbage family affairs: the evolutionary history of Brassicaceae. Trends Plant Sci. 16, 108–116. PubMed

Galindo‐González, L. , Mhiri, C. , Deyholos, M.K. and Grandbastien, M.A. (2017) LTR‐retrotransposons in plants: engines of evolution. Gene 626, 14–25. PubMed

Gardner, E.M. , Bruun‐Lund, S. , Niissalo, M. , Chantarasuwan, B. , Clement, W.L. , Geri, C. , Harrison, R.D. PubMed PMC

German, D.A. , Hendriks, K.P. , Koch, M.A. , Lens, F. , Lysak, M.A. , Bailey, C.D. , Mummenhoff, K. PubMed PMC

Guo, X. , Liu, J. , Hao, G. , Zhang, L. , Mao, K. , Wang, X. , Zhang, D. PubMed PMC

Haas, B.J. , Salzberg, S.L. , Zhu, W. , Pertea, M. , Allen, J.E. , Orvis, J. , White, O. PubMed PMC

Haberer, G. , Kamal, N. , Bauer, E. , Gundlach, H. , Fischer, I. , Seidel, M.A. , Spannagl, M. PubMed PMC

Hall, A.E. , Kettler, G.C. and Preuss, D. (2006) Dynamic evolution at pericentromeres. Genome Res. 16, 355–364. PubMed PMC

Hendriks, K.P. , Kiefer, C. , Al‐Shehbaz, I.A. , Bailey, C.D. , Hooft van Huysduynen, A. , Nikolov, L.A. , Nauheimer, L. PubMed

Hloušková, P. , Mandáková, T. , Pouch, M. , Trávníček, P. and Lysak, M.A. (2019) The large genome size variation in the PubMed PMC

Hou, X. , Wang, D. , Cheng, Z. , Wang, Y. and Jiao, Y. (2022) A near‐complete assembly of an PubMed

Huang, C.H. , Sun, R. , Hu, Y. , Zeng, L. , Zhang, N. , Cai, L. , Zhang, Q. PubMed PMC

Huerta‐Cepas, J. , Szklarczyk, D. , Heller, D. , Hernández‐Plaza, A. , Forslund, S.K. , Cook, H. , Mende, D.R. PubMed PMC

Irani, S.F. and Arab, M. (2017) Meiotic behaviour and morpho‐phenological variation in cut stock (

Jiao, W.B. , Accinelli, G.G. , Hartwig, B. , Kiefer, C. , Baker, D. , Severing, E. , Willing, E.M. PubMed PMC

Jung, H. , Jeon, M.S. , Hodgett, M. , Waterhouse, P. and Eyun, S.I. (2020) Comparative evaluation of genome assemblers from long‐read sequencing for plants and crops. J. Agric. Food Chem. 68, 7670–7677. PubMed

Kiefer, M. , Schmickl, R. , German, D.A. , Mandáková, T. , Lysak, M.A. , Al‐Shehbaz, I.A. , Franzke, A. PubMed

Kim, D. , Paggi, J.M. , Park, C. , Bennett, C. and Salzberg, S.L. (2019) Graph‐based genome alignment and genotyping with HISAT2 and HISAT‐genotype. Nat. Biotechnol. 37, 907–915. PubMed PMC

Kovaka, S. , Zimin, A.V. , Pertea, G.M. , Razaghi, R. , Salzberg, S.L. and Pertea, M. (2019) Transcriptome assembly from long‐read RNA‐seq alignments with StringTie2. Genome Biol. 20, 278. PubMed PMC

Li, H. (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100. PubMed PMC

Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows‐Wheeler transform. Bioinformatics 25, 1754–1760. PubMed PMC

Li, X. , Wang, Y. , Cai, C. , Ji, J. , Han, F. , Zhang, L. , Chen, S. PubMed PMC

Liu, J. , Zhou, S.Z. , Liu, Y.L. , Zhao, B.Y. , Yu, D. , Zhong, M.C. , Jiang, X.D. PubMed PMC

Liu, L.M. , Du, X.Y. and Guo, C. (2021) Resolving robust phylogenetic relationships of core Brassicaceae using genome skimming data. J. Syst. Evol. 59, 442–453.

Liu, Z. , Li, N. , Yu, T. , Wang, Z. , Wang, J. , Ren, J. , He, J. PubMed PMC

Lysak, M.A. , Koch, M.A. , Beaulieu, J.M. , Meister, A. and Leitch, I.J. (2009) The dynamic ups and downs of genome size evolution in Brassicaceae. Mol. Biol. Evol. 26, 85–98. PubMed

Lysak, M.A. , Mandáková, T. and Schranz, M.E. (2016) Comparative paleogenomics of crucifers: ancestral genomic blocks revisited. Curr. Opin. Plant Biol. 30, 108–115. PubMed

Mabry, M.E. , Brose, J.M. , Blischak, P.D. , Sutherland, B. , Dismukes, W.T. , Bottoms, C.A. , Edger, P.P. PubMed PMC

Mandáková, T. , Hloušková, P. , German, D.A. and Lysak, M.A. (2017a) Monophyletic origin and evolution of the largest crucifer genomes. Plant Physiol. 174, 2062–2071. PubMed PMC

Mandáková, T. , Pouch, M. , Harmanová, K. , Zhan, S.H. , Mayrose, I. and Lysak, M.A. (2017b) Multispeed genome diploidization and diversification after an ancient allopolyploidization. Mol. Ecol. 26, 6445–6462. PubMed

Mistry, J. , Finn, R.D. , Eddy, S.R. , Bateman, A. and Punta, M. (2013) Challenges in homology search: HMMER3 and convergent evolution of coiled‐coil regions. Nucleic Acids Res. 41, e121. PubMed PMC

Moriya, Y. , Itoh, M. , Okuda, S. , Yoshizawa, A.C. and Kanehisa, M. (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35(Suppl_2), W182–W185. PubMed PMC

Nakatsuka, T. and Koishi, K. (2018) Molecular characterization of a double‐flower mutation in PubMed

Naville, M. , Henriet, S. , Warren, I. , Sumic, S. , Reeve, M. , Volff, J.N. and Chourrout, D. (2019) Massive changes of genome size driven by expansions of non‐autonomous transposable elements. Curr. Biol. 29, 1161–1168. PubMed

Nikolov, L.A. , Shushkov, P. , Nevado, B. , Gan, X. , Al‐Shehbaz, I.A. , Filatov, D. , Bailey, C.D. PubMed

Nuraini, L. , Ando, Y. , Kawai, K. , Tatsuzawa, F. , Tanaka, K. , Ochiai, M. , Suzuki, K. PubMed

Nurk, S. , Walenz, B.P. , Rhie, A. , Vollger, M.R. , Logsdon, G.A. , Grothe, R. , Miga, K.H. PubMed PMC

Ou, S. , Chen, J. and Jiang, N. (2018) Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126. PubMed PMC

Ou, S. and Jiang, N. (2018) LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422. PubMed PMC

Ou, S. , Su, W. , Liao, Y. , Chougule, K. , Agda, J.R.A. , Hellinga, A.J. , Lugo, C.S.B. PubMed PMC

Pellicer, J. , Hidalgo, O. , Dodsworth, S. and Leitch, I.J. (2018) Genome size diversity and its impact on the evolution of land plants. Genes 9, 88. PubMed PMC

Piegu, B. , Guyot, R. , Picault, N. , Roulin, A. , Sanyal, A. , Kim, H. , Collura, K. PubMed PMC

Price, A.L. , Jones, N.C. and Pevzner, P.A. (2005) PubMed

Rao, S.S. , Huntley, M.H. , Durand, N.C. , Stamenova, E.K. , Bochkov, I.D. , Robinson, J.T. , Sanborn, A.L. PubMed PMC

Schranz, M.E. , Lysak, M.A. and Mitchell‐Olds, T. (2006) The ABC's of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci. 11, 535–542. PubMed

Simão, F.A. , Waterhouse, R.M. , Ioannidis, P. , Kriventseva, E.V. and Zdobnov, E.M. (2015) BUSCO: assessing genome assembly and annotation completeness with single‐copy orthologs. Bioinformatics 31, 3210–3212. PubMed

Slater, G.S. and Birney, E. (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinform. 6, 31 10.1186/1471-2105-6-31 PubMed DOI PMC

Smith, S.A. , Moore, M.J. , Brown, J.W. and Yang, Y. (2015) Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. BMC Ecol. Evolut. 15, 150. PubMed PMC

Stamatakis, A. (2014) RAxML version 8: a tool for phylogenetic analysis and post‐analysis of large phylogenies. Bioinformatics 30, 1312–1313. PubMed PMC

Stanke M, Steinkamp R, Waack S and Morgenstern B (2004) AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32(suppl_2), W309, W312. PubMed PMC

Sun, P. , Jiao, B. , Yang, Y. , Shan, L. , Li, T. , Li, X. , Xi, Z. PubMed

Sun, Y. , Shang, L. , Zhu, Q.H. , Fan, L. and Guo, L. (2022b) Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci. 27, 391–401. PubMed

Tang, H. , Krishnakumar, V. , Zeng, X. , Xu, Z. , Taranto, A. , Lomas, J.S. , Zhang, Y. PubMed PMC

Tillich, M. , Lehwark, P. , Pellizzer, T. , Ulbricht‐Jones, E.S. , Fischer, A. , Bock, R. and Greiner, S. (2017) GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45(W1), W6–W11. PubMed PMC

Vaser, R. , Sović, I. , Nagarajan, N. and Šikić, M. (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746. PubMed PMC

Walden, N. , German, D.A. , Wolf, E.M. , Kiefer, M. , Rigault, P. , Huang, X.C. , Kiefer, C. PubMed PMC

Walden, N. and Schranz, M.E. (2023) Synteny identifies reliable orthologs for phylogenomics and comparative genomics of the Brassicaceae. Genome Biol. Evol. 15, evad034. PubMed PMC

Walker, B.J. , Abeel, T. , Shea, T. , Priest, M. , Abouelliel, A. , Sakthikumar, S. , Cuomo, C.A. PubMed PMC

Wendel, J.F. and Doyle, J.J. (1998) Phylogenetic incongruence: window into genome history and molecular evolution. In Molecular Systematics of Plants II: DNA Sequencing, pp. 265–296. Boston, MA: Springer US.

Willing, E.M. , Rawat, V. , Mandáková, T. , Maumus, F. , James, G.V. , Nordström, K.J. , Becker, C. PubMed

Wlodzimierz, P. , Rabanal, F.A. , Burns, R. , Naish, M. , Primetis, E. , Scott, A. , Mandáková, T. PubMed

Wu, J. , Liang, J. , Lin, R. , Cai, X. , Zhang, L. , Guo, X. , Wang, T. PubMed PMC

Xu, Z. and Wang, H. (2007) LTR_FINDER: an efficient tool for the prediction of full‐length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268. PubMed PMC

Yan, H. , Bombarely, A. and Li, S. (2020) DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics 36, 4269–4275. PubMed

Yang, T. , Cai, B. , Jia, Z. , Wang, Y. , Wang, J. , King, G.J. , Ge, X. PubMed

Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591. PubMed

Yaniv, Z. , Schafferman, D. , Shamir, I. and Madar, Z. (1999) Cholesterol and triglyceride reduction in rats fed PubMed

Zdobnov, E.M. and Apweiler, R. (2001) InterProScan‐‐an integration platform for the signature‐recognition methods in InterPro. Bioinformatics 17, 847–848. PubMed

Zhang, L. , Liang, J. , Chen, H. , Zhang, Z. , Wu, J. and Wang, X. (2023) A near‐complete genome assembly of PubMed PMC

Zhang, R.G. , Li, G.Y. , Wang, X.L. , Dainat, J. , Wang, Z.X. , Ou, S. and Ma, Y. (2022) TEsorter: an accurate and fast method to classify LTR‐retrotransposons in plant genomes. Hortic. Res. 9, uhac017. PubMed PMC

Zhang, S.J. , Liu, L. , Yang, R. and Wang, X. (2020) Genome size evolution mediated by PubMed PMC

Zhang, W. , Lee, H.R. , Koo, D.H. and Jiang, J. (2008) Epigenetic modification of centromeric chromatin: hypomethylation of DNA sequences in the CENH3‐associated chromatin in PubMed PMC

Zhao, Q. , Feng, Q. , Lu, H. , Li, Y. , Wang, A. , Tian, Q. , Zhan, Q. PubMed

Zhou, Q. , Lim, J.Q. , Sung, W.K. and Li, G. (2019) An integrated package for bisulfite DNA methylation data analysis with indel‐sensitive mapping. BMC Bioinformatics. 20, 47. PubMed PMC

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...