Largest genome assembly in Brassicaceae: retrotransposon-driven genome expansion and karyotype evolution in Matthiola incana
Language English Country Great Britain, England Media print-electronic
Document type Journal Article
Grant support
2021YFD1600500
National Key Research and Development Program of China
32160454;32260469
National Natural Science Foundation of China
20212BAB215002
Natural Science Foundation of Jiangxi Province
2022021302024851
Wuhan Science and Technology Major Project on Key techniques of biological breeding and Breeding of new varieties
25-16142S
Czech Science Foundation
The project TowArds Next GENeration Crops (CZ.02.01.01/00/22_008/0004581) of the ERDF Programme Johannes Amos Comenius
PubMed
40569825
PubMed Central
PMC12392961
DOI
10.1111/pbi.70193
Knihovny.cz E-resources
- Keywords
- Cruciferae, Hesperodae, Lineage III, genome assembly, genome obesity, retrotransposons,
- MeSH
- Brassicaceae * genetics MeSH
- Chromosomes, Plant genetics MeSH
- Phylogeny MeSH
- Genome, Plant * genetics MeSH
- Karyotype MeSH
- Evolution, Molecular * MeSH
- Retroelements * genetics MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Retroelements * MeSH
Matthiola incana, commonly known as stock and gillyflower, is a widely grown ornamental plant whose genome is significantly larger than that of other species in the mustard family. However, the evolutionary history behind such a large genome (~2 Gb) is still unknown. Here, we have succeeded in obtaining a high-quality chromosome-scale genome assembly of M. incana by integrating PacBio HiFi reads, Illumina short reads and Hi-C data. The resulting genome consists of seven pseudochromosomes with a length of 1965 Mb and 38 245 gene models. Phylogenetic analysis indicates that M. incana and other taxa of the supertribe Hesperodae represent an early-diverging lineage in the evolutionary history of the Brassicaceae. Through a comparative analysis, we revisited the ancestral Hesperodae karyotype (AHK, n = 7) and found several differences from the well-established ancestral crucifer karyotype (ACK, n = 8) model, including extensive inter- and intra-chromosomal rearrangements. Our results suggest that the primary reason for genome obesity in M. incana is the massive expansion of long terminal repeat retrotransposons (LTR-RTs), particularly from the Angela, Athila and Retand families. CHG methylation modification is obviously reduced in the regions where the highest density of Copia-type LTR-RTs and the lowest density of Gypsy-type LTR-RTs overlap, corresponding to the putative centromeres. Based on insertion times and methylation profiling, recently inserted LTR-RTs were found to have a significantly different methylation pattern compared to older ones.
See more in PubMed
Allen, G.C. , Flores‐Vergara, M.A. , Krasynanski, S. , Kumar, S. and Thompson, W.F. (2006) A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 1, 2320–2325. PubMed
Anderson, S.N. , Stitzer, M.C. , Brohammer, A.B. , Zhou, P. , Noshay, J.M. , O'Connor, C.H. , Hirsch, C.D. PubMed
Bailey, C.D. , Koch, M.A. , Mayer, M. , Mummenhoff, K. , O'Kane, S.L., Jr. , Warwick, S.I. , Windham, M.D. PubMed
Beilstein, M.A. , Al‐Shehbaz, I.A. and Kellogg, E.A. (2006) Brassicaceae phylogeny and trichome evolution. Am. J. Bot. 93, 607–619. PubMed
Beilstein, M.A. , Al‐Shehbaz, I.A. , Mathews, S. and Kellogg, E.A. (2008) Brassicaceae phylogeny inferred from phytochrome A and ndhF sequence data: tribes and trichomes revisited. Am. J. Bot. 95, 1307–1327. PubMed
Bennetzen, J.L. and Wang, H. (2014) The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu. Rev. Plant Biol. 65, 505–530. PubMed
Bolger, A.M. , Lohse, M. and Usadel, B. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. PubMed PMC
Buchfink, B. , Xie, C. and Huson, D.H. (2015) Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60. PubMed
Chen, J. , Wang, Z. , Tan, K. , Huang, W. , Shi, J. , Li, T. , Hu, J. PubMed PMC
Cheng, F. , Liang, J. , Cai, C. , Cai, X. , Wu, J. and Wang, X. (2017) Genome sequencing supports a multi‐vertex model for Brassiceae species. Curr. Opin. Plant Biol. 36, 79–87. PubMed
Cheng, F. , Mandáková, T. , Wu, J. , Xie, Q. , Lysak, M.A. and Wang, X. (2013) Deciphering the diploid ancestral genome of the mesohexaploid PubMed PMC
Cheng, H. , Concepcion, G.T. , Feng, X. , Zhang, H. and Li, H. (2021) Haplotype‐resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175. PubMed PMC
Chernomor, O. , von Haeseler, A. and Minh, B.Q. (2016) Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65, 997–1008. PubMed PMC
Duan, L. , Fu, L. and Chen, H.F. (2023) Phylogenomic cytonuclear discordance and evolutionary histories of plants and animals. Sci. China Life Sci. 66, 2946–2948. PubMed
Dudchenko, O. , Batra, S.S. , Omer, A.D. , Nyquist, S.K. , Hoeger, M. , Durand, N.C. , Shamim, M.S. PubMed PMC
Durand, N.C. , Robinson, J.T. , Shamim, M.S. , Machol, I. , Mesirov, J.P. , Lander, E.S. and Aiden, E.L. (2016) Juicebox provides a visualization system for Hi‐C contact maps with unlimited zoom. Cell Systems. 3, 99–101. PubMed PMC
Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113. PubMed PMC
Ellinghaus, D. , Kurtz, S. and Willhoeft, U. (2008) LTRharvest, an efficient and flexible software for PubMed PMC
Emms, D.M. and Kelly, S. (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238. PubMed PMC
Franzke, A. , Lysak, M.A. , Al‐Shehbaz, I.A. , Koch, M.A. and Mummenhoff, K. (2011) Cabbage family affairs: the evolutionary history of Brassicaceae. Trends Plant Sci. 16, 108–116. PubMed
Galindo‐González, L. , Mhiri, C. , Deyholos, M.K. and Grandbastien, M.A. (2017) LTR‐retrotransposons in plants: engines of evolution. Gene 626, 14–25. PubMed
Gardner, E.M. , Bruun‐Lund, S. , Niissalo, M. , Chantarasuwan, B. , Clement, W.L. , Geri, C. , Harrison, R.D. PubMed PMC
German, D.A. , Hendriks, K.P. , Koch, M.A. , Lens, F. , Lysak, M.A. , Bailey, C.D. , Mummenhoff, K. PubMed PMC
Guo, X. , Liu, J. , Hao, G. , Zhang, L. , Mao, K. , Wang, X. , Zhang, D. PubMed PMC
Haas, B.J. , Salzberg, S.L. , Zhu, W. , Pertea, M. , Allen, J.E. , Orvis, J. , White, O. PubMed PMC
Haberer, G. , Kamal, N. , Bauer, E. , Gundlach, H. , Fischer, I. , Seidel, M.A. , Spannagl, M. PubMed PMC
Hall, A.E. , Kettler, G.C. and Preuss, D. (2006) Dynamic evolution at pericentromeres. Genome Res. 16, 355–364. PubMed PMC
Hendriks, K.P. , Kiefer, C. , Al‐Shehbaz, I.A. , Bailey, C.D. , Hooft van Huysduynen, A. , Nikolov, L.A. , Nauheimer, L. PubMed
Hloušková, P. , Mandáková, T. , Pouch, M. , Trávníček, P. and Lysak, M.A. (2019) The large genome size variation in the PubMed PMC
Hou, X. , Wang, D. , Cheng, Z. , Wang, Y. and Jiao, Y. (2022) A near‐complete assembly of an PubMed
Huang, C.H. , Sun, R. , Hu, Y. , Zeng, L. , Zhang, N. , Cai, L. , Zhang, Q. PubMed PMC
Huerta‐Cepas, J. , Szklarczyk, D. , Heller, D. , Hernández‐Plaza, A. , Forslund, S.K. , Cook, H. , Mende, D.R. PubMed PMC
Irani, S.F. and Arab, M. (2017) Meiotic behaviour and morpho‐phenological variation in cut stock (
Jiao, W.B. , Accinelli, G.G. , Hartwig, B. , Kiefer, C. , Baker, D. , Severing, E. , Willing, E.M. PubMed PMC
Jung, H. , Jeon, M.S. , Hodgett, M. , Waterhouse, P. and Eyun, S.I. (2020) Comparative evaluation of genome assemblers from long‐read sequencing for plants and crops. J. Agric. Food Chem. 68, 7670–7677. PubMed
Kiefer, M. , Schmickl, R. , German, D.A. , Mandáková, T. , Lysak, M.A. , Al‐Shehbaz, I.A. , Franzke, A. PubMed
Kim, D. , Paggi, J.M. , Park, C. , Bennett, C. and Salzberg, S.L. (2019) Graph‐based genome alignment and genotyping with HISAT2 and HISAT‐genotype. Nat. Biotechnol. 37, 907–915. PubMed PMC
Kovaka, S. , Zimin, A.V. , Pertea, G.M. , Razaghi, R. , Salzberg, S.L. and Pertea, M. (2019) Transcriptome assembly from long‐read RNA‐seq alignments with StringTie2. Genome Biol. 20, 278. PubMed PMC
Li, H. (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100. PubMed PMC
Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows‐Wheeler transform. Bioinformatics 25, 1754–1760. PubMed PMC
Li, X. , Wang, Y. , Cai, C. , Ji, J. , Han, F. , Zhang, L. , Chen, S. PubMed PMC
Liu, J. , Zhou, S.Z. , Liu, Y.L. , Zhao, B.Y. , Yu, D. , Zhong, M.C. , Jiang, X.D. PubMed PMC
Liu, L.M. , Du, X.Y. and Guo, C. (2021) Resolving robust phylogenetic relationships of core Brassicaceae using genome skimming data. J. Syst. Evol. 59, 442–453.
Liu, Z. , Li, N. , Yu, T. , Wang, Z. , Wang, J. , Ren, J. , He, J. PubMed PMC
Lysak, M.A. , Koch, M.A. , Beaulieu, J.M. , Meister, A. and Leitch, I.J. (2009) The dynamic ups and downs of genome size evolution in Brassicaceae. Mol. Biol. Evol. 26, 85–98. PubMed
Lysak, M.A. , Mandáková, T. and Schranz, M.E. (2016) Comparative paleogenomics of crucifers: ancestral genomic blocks revisited. Curr. Opin. Plant Biol. 30, 108–115. PubMed
Mabry, M.E. , Brose, J.M. , Blischak, P.D. , Sutherland, B. , Dismukes, W.T. , Bottoms, C.A. , Edger, P.P. PubMed PMC
Mandáková, T. , Hloušková, P. , German, D.A. and Lysak, M.A. (2017a) Monophyletic origin and evolution of the largest crucifer genomes. Plant Physiol. 174, 2062–2071. PubMed PMC
Mandáková, T. , Pouch, M. , Harmanová, K. , Zhan, S.H. , Mayrose, I. and Lysak, M.A. (2017b) Multispeed genome diploidization and diversification after an ancient allopolyploidization. Mol. Ecol. 26, 6445–6462. PubMed
Mistry, J. , Finn, R.D. , Eddy, S.R. , Bateman, A. and Punta, M. (2013) Challenges in homology search: HMMER3 and convergent evolution of coiled‐coil regions. Nucleic Acids Res. 41, e121. PubMed PMC
Moriya, Y. , Itoh, M. , Okuda, S. , Yoshizawa, A.C. and Kanehisa, M. (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35(Suppl_2), W182–W185. PubMed PMC
Nakatsuka, T. and Koishi, K. (2018) Molecular characterization of a double‐flower mutation in PubMed
Naville, M. , Henriet, S. , Warren, I. , Sumic, S. , Reeve, M. , Volff, J.N. and Chourrout, D. (2019) Massive changes of genome size driven by expansions of non‐autonomous transposable elements. Curr. Biol. 29, 1161–1168. PubMed
Nikolov, L.A. , Shushkov, P. , Nevado, B. , Gan, X. , Al‐Shehbaz, I.A. , Filatov, D. , Bailey, C.D. PubMed
Nuraini, L. , Ando, Y. , Kawai, K. , Tatsuzawa, F. , Tanaka, K. , Ochiai, M. , Suzuki, K. PubMed
Nurk, S. , Walenz, B.P. , Rhie, A. , Vollger, M.R. , Logsdon, G.A. , Grothe, R. , Miga, K.H. PubMed PMC
Ou, S. , Chen, J. and Jiang, N. (2018) Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126. PubMed PMC
Ou, S. and Jiang, N. (2018) LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422. PubMed PMC
Ou, S. , Su, W. , Liao, Y. , Chougule, K. , Agda, J.R.A. , Hellinga, A.J. , Lugo, C.S.B. PubMed PMC
Pellicer, J. , Hidalgo, O. , Dodsworth, S. and Leitch, I.J. (2018) Genome size diversity and its impact on the evolution of land plants. Genes 9, 88. PubMed PMC
Piegu, B. , Guyot, R. , Picault, N. , Roulin, A. , Sanyal, A. , Kim, H. , Collura, K. PubMed PMC
Price, A.L. , Jones, N.C. and Pevzner, P.A. (2005) PubMed
Rao, S.S. , Huntley, M.H. , Durand, N.C. , Stamenova, E.K. , Bochkov, I.D. , Robinson, J.T. , Sanborn, A.L. PubMed PMC
Schranz, M.E. , Lysak, M.A. and Mitchell‐Olds, T. (2006) The ABC's of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci. 11, 535–542. PubMed
Simão, F.A. , Waterhouse, R.M. , Ioannidis, P. , Kriventseva, E.V. and Zdobnov, E.M. (2015) BUSCO: assessing genome assembly and annotation completeness with single‐copy orthologs. Bioinformatics 31, 3210–3212. PubMed
Slater, G.S. and Birney, E. (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinform. 6, 31 10.1186/1471-2105-6-31 PubMed DOI PMC
Smith, S.A. , Moore, M.J. , Brown, J.W. and Yang, Y. (2015) Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. BMC Ecol. Evolut. 15, 150. PubMed PMC
Stamatakis, A. (2014) RAxML version 8: a tool for phylogenetic analysis and post‐analysis of large phylogenies. Bioinformatics 30, 1312–1313. PubMed PMC
Stanke M, Steinkamp R, Waack S and Morgenstern B (2004) AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32(suppl_2), W309, W312. PubMed PMC
Sun, P. , Jiao, B. , Yang, Y. , Shan, L. , Li, T. , Li, X. , Xi, Z. PubMed
Sun, Y. , Shang, L. , Zhu, Q.H. , Fan, L. and Guo, L. (2022b) Twenty years of plant genome sequencing: achievements and challenges. Trends Plant Sci. 27, 391–401. PubMed
Tang, H. , Krishnakumar, V. , Zeng, X. , Xu, Z. , Taranto, A. , Lomas, J.S. , Zhang, Y. PubMed PMC
Tillich, M. , Lehwark, P. , Pellizzer, T. , Ulbricht‐Jones, E.S. , Fischer, A. , Bock, R. and Greiner, S. (2017) GeSeq – versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45(W1), W6–W11. PubMed PMC
Vaser, R. , Sović, I. , Nagarajan, N. and Šikić, M. (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746. PubMed PMC
Walden, N. , German, D.A. , Wolf, E.M. , Kiefer, M. , Rigault, P. , Huang, X.C. , Kiefer, C. PubMed PMC
Walden, N. and Schranz, M.E. (2023) Synteny identifies reliable orthologs for phylogenomics and comparative genomics of the Brassicaceae. Genome Biol. Evol. 15, evad034. PubMed PMC
Walker, B.J. , Abeel, T. , Shea, T. , Priest, M. , Abouelliel, A. , Sakthikumar, S. , Cuomo, C.A. PubMed PMC
Wendel, J.F. and Doyle, J.J. (1998) Phylogenetic incongruence: window into genome history and molecular evolution. In Molecular Systematics of Plants II: DNA Sequencing, pp. 265–296. Boston, MA: Springer US.
Willing, E.M. , Rawat, V. , Mandáková, T. , Maumus, F. , James, G.V. , Nordström, K.J. , Becker, C. PubMed
Wlodzimierz, P. , Rabanal, F.A. , Burns, R. , Naish, M. , Primetis, E. , Scott, A. , Mandáková, T. PubMed
Wu, J. , Liang, J. , Lin, R. , Cai, X. , Zhang, L. , Guo, X. , Wang, T. PubMed PMC
Xu, Z. and Wang, H. (2007) LTR_FINDER: an efficient tool for the prediction of full‐length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268. PubMed PMC
Yan, H. , Bombarely, A. and Li, S. (2020) DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics 36, 4269–4275. PubMed
Yang, T. , Cai, B. , Jia, Z. , Wang, Y. , Wang, J. , King, G.J. , Ge, X. PubMed
Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591. PubMed
Yaniv, Z. , Schafferman, D. , Shamir, I. and Madar, Z. (1999) Cholesterol and triglyceride reduction in rats fed PubMed
Zdobnov, E.M. and Apweiler, R. (2001) InterProScan‐‐an integration platform for the signature‐recognition methods in InterPro. Bioinformatics 17, 847–848. PubMed
Zhang, L. , Liang, J. , Chen, H. , Zhang, Z. , Wu, J. and Wang, X. (2023) A near‐complete genome assembly of PubMed PMC
Zhang, R.G. , Li, G.Y. , Wang, X.L. , Dainat, J. , Wang, Z.X. , Ou, S. and Ma, Y. (2022) TEsorter: an accurate and fast method to classify LTR‐retrotransposons in plant genomes. Hortic. Res. 9, uhac017. PubMed PMC
Zhang, S.J. , Liu, L. , Yang, R. and Wang, X. (2020) Genome size evolution mediated by PubMed PMC
Zhang, W. , Lee, H.R. , Koo, D.H. and Jiang, J. (2008) Epigenetic modification of centromeric chromatin: hypomethylation of DNA sequences in the CENH3‐associated chromatin in PubMed PMC
Zhao, Q. , Feng, Q. , Lu, H. , Li, Y. , Wang, A. , Tian, Q. , Zhan, Q. PubMed
Zhou, Q. , Lim, J.Q. , Sung, W.K. and Li, G. (2019) An integrated package for bisulfite DNA methylation data analysis with indel‐sensitive mapping. BMC Bioinformatics. 20, 47. PubMed PMC