Diploid A genome wheat species harbor immense genetic variability which has been targeted and proven useful in wheat improvement. Development and deployment of sequence-based markers has opened avenues for comparative analysis, gene transfer and marker assisted selection (MAS) using high throughput cost effective genotyping techniques. Chromosome 2A of wheat is known to harbor several economically important genes. The present study aimed at identification of genic sequences corresponding to full length cDNAs and mining of SSRs and ISBPs from 2A draft sequence assembly of hexaploid wheat cv. Chinese Spring for marker development. In total, 1029 primer pairs including 478 gene derived, 501 SSRs and 50 ISBPs were amplified in diploid A genome species Triticum monococcum and T. boeoticum identifying 221 polymorphic loci. Out of these, 119 markers were mapped onto a pre-existing chromosome 2A genetic map consisting of 42 mapped markers. The enriched genetic map constituted 161 mapped markers with final map length of 549.6 cM. Further, 2A genetic map of T. monococcum was anchored to the physical map of 2A of cv. Chinese Spring which revealed several rearrangements between the two species. The present study generated a highly saturated genetic map of 2A and physical anchoring of genetically mapped markers revealed a complex genetic architecture of chromosome 2A that needs to be investigated further.
- MeSH
- chromozomy rostlin genetika MeSH
- diploidie MeSH
- jednonukleotidový polymorfismus MeSH
- lokus kvantitativního znaku * MeSH
- mapování chromozomů metody MeSH
- mikrosatelitní repetice MeSH
- polyploidie MeSH
- pšenice genetika MeSH
- sekvenční analýza DNA MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH
Reference genomes of important cereals, including barley, emmer wheat and bread wheat, were released recently. Their comparison with genome size estimates obtained by flow cytometry indicated that the assemblies represent not more than 88-98% of the complete genome. This work is aimed at identifying the missing parts in two cereal genomes and proposing techniques to make the assemblies more complete. We focused on tandemly organised repetitive sequences, known to be underrepresented in genome assemblies generated from short-read sequence data. Our study found arrays of three tandem repeats with unit sizes of 1242 to 2726 bp present in the bread wheat reference genome generated from short reads. However, this and another wheat genome assembly employing long PacBio reads failed in integrating correctly the 2726-bp repeat in the pseudomolecule context. This suggests that tandem repeats of this size, frequently incorporated in unassigned scaffolds, may contribute to shrinking of pseudomolecules without reducing size of the entire assembly. We demonstrate how this missing information may be added to the pseudomolecules with the aid of nanopore sequencing of individual BAC clones and optical mapping. Using the latter technique, we identified and localised a 470-kb long array of 45S ribosomal DNA absent from the reference genome of barley.
Any project seeking to deliver a plant or animal reference genome sequence must address the question as to the completeness of the assembly. Given the complexity introduced particularly by the presence of sequence redundancy, a problem which is especially acute in polyploid genomes, this question is not an easy one to answer. One approach is to use the sequence data, along with the appropriate computational tools, the other is to compare the estimate of genome size with an experimentally measured mass of nuclear DNA. The latter requires a reference standard in order to provide a robust relationship between the two independent measurements of genome size. Here, the proposal is to choose the human male leucocyte genome for this standard: its 1C DNA amount (the amount of DNA contained within unreplicated haploid chromosome set) of 3.50 pg is equivalent to a genome length of 3.423 Gbp, a size which is just 5% longer than predicted by the most current human genome assembly. Adopting this standard, this paper assesses the completeness of the reference genome assemblies of the leading cereal crops species wheat, barley and rye.
- MeSH
- délka genomu * MeSH
- genom lidský MeSH
- genom rostlinný * MeSH
- lidé MeSH
- pšenice genetika MeSH
- referenční standardy MeSH
- sekvenční analýza DNA * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Numerous scaffold-level sequences for wheat are now being released and, in this context, we report on a strategy for improving the overall assembly to a level comparable to that of the human genome. RESULTS: Using chromosome 7A of wheat as a model, sequence-finished megabase-scale sections of this chromosome were established by combining a new independent assembly using a bacterial artificial chromosome (BAC)-based physical map, BAC pool paired-end sequencing, chromosome-arm-specific mate-pair sequencing and Bionano optical mapping with the International Wheat Genome Sequencing Consortium RefSeq v1.0 sequence and its underlying raw data. The combined assembly results in 18 super-scaffolds across the chromosome. The value of finished genome regions is demonstrated for two approximately 2.5 Mb regions associated with yield and the grain quality phenotype of fructan carbohydrate grain levels. In addition, the 50 Mb centromere region analysis incorporates cytological data highlighting the importance of non-sequence data in the assembly of this complex genome region. CONCLUSIONS: Sufficient genome sequence information is shown to now be available for the wheat community to produce sequence-finished releases of each chromosome of the reference genome. The high-level completion identified that an array of seven fructosyl transferase genes underpins grain quality and that yield attributes are affected by five F-box-only-protein-ubiquitin ligase domain and four root-specific lipid transfer domain genes. The completed sequence also includes the centromere.
- MeSH
- centromera metabolismus MeSH
- chromozomy rostlin genetika MeSH
- fruktany analýza MeSH
- fyzikální mapování chromozomů metody MeSH
- genom rostlinný * MeSH
- optické jevy * MeSH
- pšenice genetika MeSH
- semena rostlinná genetika MeSH
- umělé bakteriální chromozomy genetika MeSH
- zemědělství * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
The hexaploid wheat genotype Chinese Spring (CS) has been used worldwide as the reference base for wheat genetics and genomics, and significant resources have been used by the international community to generate a reference wheat genome based on this genotype. By sequencing flow-sorted 3B chromosome from a hexaploid wheat genotype CRNIL1A and comparing the obtained sequences with those available for CS, we detected that a large number of sequences in the former were missing in the latter. If the distribution of such sequences in the hexaploid wheat genome is random, CRNILA sequences missing in CS could be as much as 159.3 Mb even if only fragments of 50 bp or longer were considered. Analysing RNA sequences available in the public domains also revealed that dispensable genes are common in hexaploid wheat. Together with those extensive intra- and interchromosomal rearrangements in CS, the existence of such dispensable genes is another factor highlighting potential issues with the use of reference genomes in various studies. Strong deviation in distributions of these dispensable sequences among genotypes with different geographical origins provided the first evidence indicating that they could be associated with adaptation in wheat.
Goat grasses (Aegilops spp.) contributed to the evolution of bread wheat and are important sources of genes and alleles for modern wheat improvement. However, their use in alien introgression breeding is hindered by poor knowledge of their genome structure and a lack of molecular tools. The analysis of large and complex genomes may be simplified by dissecting them into single chromosomes via flow cytometric sorting. In some species this is not possible due to similarities in relative DNA content among chromosomes within a karyotype. This work describes the distribution of GAA and ACG microsatellite repeats on chromosomes of the U, M, S and C genomes of Aegilops, and the use of microsatellite probes to label the chromosomes in suspension by fluorescence in situ hybridization (FISHIS). Bivariate flow cytometric analysis of chromosome DAPI fluorescence and fluorescence of FITC-labelled microsatellites made it possible to discriminate all chromosomes and sort them with negligible contamination by other chromosomes. DNA of purified chromosomes was used as a template for polymerase chain reation (PCR) using Conserved Orthologous Set (COS) markers with known positions on wheat A, B and D genomes. Wheat-Aegilops macrosyntenic comparisons using COS markers revealed significant rearrangements in the U and C genomes, while the M and S genomes exhibited structure similar to wheat. Purified chromosome fractions provided an attractive resource to investigate the structure and evolution of the Aegilops genomes, and the COS markers assigned to Aegilops chromosomes will facilitate alien gene introgression into wheat.
- MeSH
- chromozomy rostlin genetika MeSH
- hybridizace in situ MeSH
- průtoková cytometrie MeSH
- pšenice genetika MeSH
- Publikační typ
- časopisecké články MeSH
This study aims to understand the genetic diversity of traditional Oceanian starchy bananas in order to propose an efficient conservation strategy for these endangered varieties. SSR and DArT molecular markers are used to characterize a large sample of Pacific accessions, from New Guinea to Tahiti and Hawaii. All Pacific starchy bananas are shown of New Guinea origin, by interspecific hybridization between Musa acuminata (AA genome), more precisely its local subspecies M. acuminata ssp. banksii, and M. balbisiana (BB genome) generating triploid AAB Pacific starchy bananas. These AAB genotypes do not form a subgroup sensu stricto and genetic markers differentiate two subgroups across the three morphotypes usually identified: Iholena versus Popoulu and Maoli. The Popoulu/Maoli accessions, even if morphologically diverse throughout the Pacific, cluster in the same genetic subgroup. However, the subgroup is not strictly monophyletic and several close, but different genotypes are linked to the dominant genotype. One of the related genotypes is specific to New Caledonia (NC), with morphotypes close to Maoli, but with some primitive characters. It is concluded that the diffusion of Pacific starchy AAB bananas results from a series of introductions of triploids originating in New Guinea area from several sexual recombination events implying different genotypes of M. acuminata ssp. banksii. This scheme of multiple waves from the New Guinea zone is consistent with the archaeological data for peopling of the Pacific. The present geographic distribution suggests that a greater diversity must have existed in the past. Its erosion finds parallels with the erosion of cultural traditions, inexorably declining in most of the Polynesian or Melanesian Islands. Symmetrically, diversity hot spots appear linked to the local persistence of traditions: Maoli in New Caledonian Kanak traditions or Iholena in a few Polynesian islands. These results will contribute to optimizing the conservation strategy for the ex-situ Pacific Banana Collection supported collectively by the Pacific countries.
- MeSH
- banánovník genetika MeSH
- genetická variace * MeSH
- genetické markery MeSH
- genotyp MeSH
- hybridizace genetická MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Oceánie MeSH
BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
Wild emmer wheat, Triticum turgidum ssp. dicoccoides is the wild relative of Triticum turgidum, the progenitor of durum and bread wheat, and maintains a rich allelic diversity among its wild populations. The lack of adequate genetic and genomic resources, however, restricts its exploitation in wheat improvement. Here, we report next-generation sequencing of the flow-sorted chromosome 5B of T. dicoccoides to shed light into its genome structure, function and organization by exploring the repetitive elements, protein-encoding genes and putative microRNA and tRNA coding sequences. Comparative analyses with its counterparts in modern and wild wheats suggest clues into the B-genome evolution. Syntenic relationships of chromosome 5B with the model grasses can facilitate further efforts for fine-mapping of traits of interest. Mapping of 5B sequences onto the root transcriptomes of two additional T. dicoccoides genotypes, with contrasting drought tolerances, revealed several thousands of single nucleotide polymorphisms, of which 584 shared polymorphisms on 228 transcripts were specific to the drought-tolerant genotype. To our knowledge, this study presents the largest genomics resource currently available for T. dicoccoides, which, we believe, will encourage the exploitation of its genetic and genomic potential for wheat improvement to meet the increasing demand to feed the world.
- MeSH
- chromozomy rostlin genetika MeSH
- mikro RNA genetika MeSH
- molekulární evoluce * MeSH
- pšenice genetika MeSH
- RNA rostlin genetika MeSH
- RNA transferová genetika MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- srovnávací studie MeSH
BACKGROUND: The number and complexity of repetitive elements varies between species, being in general most represented in those with larger genomes. Combining the flow-sorted chromosome arms approach to genome analysis with second generation DNA sequencing technologies provides a unique opportunity to study the repetitive portion of each chromosome, enabling comparisons among them. Additionally, different sequencing approaches may produce different depth of insight to repeatome content and structure. In this work we analyze and characterize the repetitive sequences of Triticum aestivum cv. Chinese Spring homeologous group 4 chromosome arms, obtained through Roche 454 and Illumina sequencing technologies, hereinafter marked by subscripts 454 and I, respectively. Repetitive sequences were identified with the RepeatMasker software using the interspersed repeat database mips-REdat_v9.0p. The input sequences consisted of our 4DS454 and 4DL454 scaffolds and 4ASI, 4ALI, 4BSI, 4BLI, 4DSI and 4DLI contigs, downloaded from the International Wheat Genome Sequencing Consortium (IWGSC). RESULTS: Repetitive sequences content varied from 55% to 63% for all chromosome arm assemblies except for 4DLI, in which the repeat content was 38%. Transposable elements, small RNA, satellites, simple repeats and low complexity sequences were analyzed. SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher. Dinucleotides and trinucleotides were the most abundant SSR repeat units. (GA)n/(TC)n was the most abundant SSR except for 4DLI where the most frequently identified SSR was (CCG/CGG)n. Retrotransposons followed by DNA transposons were the most highly represented sequence repeats, mainly composed of CACTA/En-Spm and Gypsy superfamilies, respectively. This whole chromosome sequence analysis allowed identification of three new LTR retrotransposon families belonging to the Copia superfamily, one belonging to the Gypsy superfamily and two TRIM retrotransposon families. Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes. CONCLUSION: The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.