Subterranean clover is an important annual forage legume, whose diploidy and inbreeding nature make it an ideal model for genomic analysis in Trifolium. We reported a draft genome assembly of the subterranean clover TSUd_r1.1. Here we evaluate genome mapping on nanochannel arrays and generation of a transcriptome atlas across tissues to advance the assembly and gene annotation. Using a BioNano-based assembly spanning 512 Mb (93% genome coverage), we validated the draft assembly, anchored unplaced contigs and resolved misassemblies. Multiple contigs (264) from the draft assembly coalesced into 97 super-scaffolds (43% of genome). Sequences longer than >1 Mb increased from 40 to 189 Mb giving 1.4-fold increase in N50 with total genome in pseudomolecules improved from 73 to 80%. The advanced assembly was re-annotated using transcriptome atlas data to contain 31 272 protein-coding genes capturing >96% of the gene content. Functional characterization and GO enrichment confirmed gene expression for response to water deprivation, flavonoid biosynthesis and embryo development ending in seed dormancy, reflecting adaptation to the harsh Mediterranean environment. Comparative analyses across Papilionoideae identified 24 893 Trifolium-specific and 6325 subterranean-clover-specific genes that could be mined further for traits such as geocarpy and grazing tolerance. Eight key traits, including persistence, improved livestock health by isoflavonoid production in addition to important agro-morphological traits, were fine-mapped on the high-density SNP linkage map anchored to the assembly. This new genomic information is crucial to identify loci governing traits allowing marker-assisted breeding, comparative mapping and identification of tissue-specific gene promoters for biotechnological improvement of forage legumes.
- Keywords
- BioNano, Legume comparative genomics, advanced reference assembly, forage legumes, gene expression, transcriptome,
- MeSH
- Genome, Plant genetics MeSH
- Genomics methods MeSH
- Sequence Analysis, DNA methods MeSH
- Trifolium genetics MeSH
- Publication type
- Journal Article MeSH
Recent advancements have finally delivered a complete human genome assembly, including the elusive Y chromosome. This accomplishment closes a significant knowledge gap. Prior efforts were hampered by challenges in sequencing repetitive DNA structures such as direct and inverted repeats. We used the G4Hunter algorithm to analyze the presence of G-quadruplex forming sequences (G4s) within the current human reference genome (GRCh38) and the new telomere-to-telomere (T2T) Y chromosome assemblies. This analysis served a dual purpose: identifying the location of potential G4s within the genomes and exploring their association with functionally annotated sequences. Compared to GRCh38, the T2T assembly exhibited a significantly higher prevalence of G-quadruplex forming sequences. Notably, these repeats were abundantly located around precursor RNA, exons, genes, and within protein binding sites. This remarkable co-occurrence of G4-forming sequences with these critical regulatory regions suggests their role in fundamental DNA regulation processes. Our findings indicate that the current human reference genome significantly underestimated the number of G4s, potentially overlooking their functional importance.
- Keywords
- Chromosome Y, G-quadruplex, Gapless assembly, Genome analysis,
- MeSH
- Algorithms MeSH
- G-Quadruplexes * MeSH
- Genome, Human * MeSH
- Humans MeSH
- Chromosomes, Human, Y * genetics MeSH
- Telomere genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
- Keywords
- Bioinformatics tool, GBS, Genome assembly, Genome map, Musa acuminata, Paired-end sequences,
- MeSH
- Molecular Sequence Annotation MeSH
- Musa genetics MeSH
- Genetic Markers MeSH
- Genome, Plant * MeSH
- Contig Mapping MeSH
- Sequence Analysis, DNA MeSH
- Computational Biology methods MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Genetic Markers MeSH
Taking advantage of evolving and improving sequencing methods, human chromosome 8 is now available as a gapless, end-to-end assembly. Thanks to advances in long-read sequencing technologies, its centromere, telomeres, duplicated gene families and repeat-rich regions are now fully sequenced. We were interested to assess if the new assembly altered our understanding of the potential impact of non-B DNA structures within this completed chromosome sequence. It has been shown that non-B secondary structures, such as G-quadruplexes, hairpins and cruciforms, have important regulatory functions and potential as targeted therapeutics. Therefore, we analysed the presence of putative G-quadruplex forming sequences and inverted repeats in the current human reference genome (GRCh38) and in the new end-to-end assembly of chromosome 8. The comparison revealed that the new assembly contains significantly more inverted repeats and G-quadruplex forming sequences compared to the current reference sequence. This observation can be explained by improved accuracy of the new sequencing methods, particularly in regions that contain extensive repeats of bases, as is preferred by many non-B DNA structures. These results show a significant underestimation of the prevalence of non-B DNA secondary structure in previous assembly versions of the human genome and point to their importance being not fully appreciated. We anticipate that similar observations will occur as the improved sequencing technologies fill in gaps across the genomes of humans and other organisms.
- Keywords
- G-quadruplex, Genome sequence of human chromosome 8, Inverted repeat, Non-B DNA structures,
- MeSH
- G-Quadruplexes * MeSH
- Genome, Human MeSH
- Sequence Inversion * MeSH
- Humans MeSH
- Chromosomes, Human, Pair 8 * MeSH
- Sequence Analysis, DNA MeSH
- Telomere * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
In this study, we aimed to systematically assess the frequency at which potentially deleterious phenotypes appear in natural populations of the outcrossing model plant Arabidopsis arenosa, and to establish their underlying genetics. For this purpose, we collected seeds from wild A. arenosa populations and screened over 2,500 plants for unusual phenotypes in the greenhouse. We repeatedly found plants with obvious phenotypic defects, such as small stature and necrotic or chlorotic leaves, among first-generation progeny of wild A. arenosa plants. Such abnormal plants were present in about 10% of maternal sibships, with multiple plants with similar phenotypes in each of these sibships, pointing to a genetic basis of the observed defects. A combination of transcriptome profiling, linkage mapping and genome-wide runs of homozygosity patterns using a newly assembled reference genome indicated a range of underlying genetic architectures associated with phenotypic abnormalities. This included evidence for homozygosity of certain genomic regions, consistent with alleles that are identical by descent being responsible for these defects. Our observations suggest that deleterious alleles with different genetic architectures are segregating at appreciable frequencies in wild A. arenosa populations.
Sugarcane, the world's most harvested crop by tonnage, has shaped global history, trade and geopolitics, and is currently responsible for 80% of sugar production worldwide1. While traditional sugarcane breeding methods have effectively generated cultivars adapted to new environments and pathogens, sugar yield improvements have recently plateaued2. The cessation of yield gains may be due to limited genetic diversity within breeding populations, long breeding cycles and the complexity of its genome, the latter preventing breeders from taking advantage of the recent explosion of whole-genome sequencing that has benefited many other crops. Thus, modern sugarcane hybrids are the last remaining major crop without a reference-quality genome. Here we take a major step towards advancing sugarcane biotechnology by generating a polyploid reference genome for R570, a typical modern cultivar derived from interspecific hybridization between the domesticated species (Saccharum officinarum) and the wild species (Saccharum spontaneum). In contrast to the existing single haplotype ('monoploid') representation of R570, our 8.7 billion base assembly contains a complete representation of unique DNA sequences across the approximately 12 chromosome copies in this polyploid genome. Using this highly contiguous genome assembly, we filled a previously unsized gap within an R570 physical genetic map to describe the likely causal genes underlying the single-copy Bru1 brown rust resistance locus. This polyploid genome assembly with fine-grain descriptions of genome architecture and molecular targets for biotechnology will help accelerate molecular and transgenic breeding and adaptation of sugarcane to future environmental conditions.
- MeSH
- Biotechnology MeSH
- Chromosomes, Plant genetics MeSH
- DNA, Plant genetics MeSH
- Genome, Plant * genetics MeSH
- Haplotypes genetics MeSH
- Hybridization, Genetic genetics MeSH
- Polyploidy * MeSH
- Reference Standards MeSH
- Saccharum * classification genetics MeSH
- Plant Breeding MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Names of Substances
- DNA, Plant MeSH
BACKGROUND: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. RESULTS: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. CONCLUSIONS: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.
- Keywords
- Annotation, Genome assembly, Library, Manual curation, Non-model organism, Transposable elements,
- Publication type
- Journal Article MeSH
One of main steps in a study of microbial communities is resolving their composition, diversity and function. In the past, these issues were mostly addressed by the use of amplicon sequencing of a target gene because of reasonable price and easier computational postprocessing of the bioinformatic data. With the advancement of sequencing techniques, the main focus shifted to the whole metagenome shotgun sequencing, which allows much more detailed analysis of the metagenomic data, including reconstruction of novel microbial genomes and to gain knowledge about genetic potential and metabolic capacities of whole environments. On the other hand, the output of whole metagenomic shotgun sequencing is mixture of short DNA fragments belonging to various genomes, therefore this approach requires more sophisticated computational algorithms for clustering of related sequences, commonly referred to as sequence binning. There are currently two types of binning methods: taxonomy dependent and taxonomy independent. The first type classifies the DNA fragments by performing a standard homology inference against a reference database, while the latter performs the reference-free binning by applying clustering techniques on features extracted from the sequences. In this review, we describe the strategies within the second approach. Although these strategies do not require prior knowledge, they have higher demands on the length of sequences. Besides their basic principle, an overview of particular methods and tools is provided. Furthermore, the review covers the utilization of the methods in context with the length of sequences and discusses the needs for metagenomic data preprocessing in form of initial assembly prior to binning.
- Keywords
- Abundance, Genomic signature, Metagenomics, Sequence binning, Taxonomy independent, Visualization,
- Publication type
- Journal Article MeSH
- Review MeSH
Recent technological advances in next-generation sequencing (NGS) technologies have dramatically reduced the cost of DNA sequencing, allowing species with large and complex genomes to be sequenced. Although bread wheat (Triticum aestivum L.) is one of the world's most important food crops, efficient exploitation of molecular marker-assisted breeding approaches has lagged behind that achieved in other crop species, due to its large polyploid genome. However, an international public-private effort spanning 9 years reported over 65% draft genome of bread wheat in 2014, and finally, after more than a decade culminated in the release of a gold-standard, fully annotated reference wheat-genome assembly in 2018. Shortly thereafter, in 2020, the genome of assemblies of additional 15 global wheat accessions was released. As a result, wheat has now entered into the pan-genomic era, where basic resources can be efficiently exploited. Wheat genotyping with a few hundred markers has been replaced by genotyping arrays, capable of characterizing hundreds of wheat lines, using thousands of markers, providing fast, relatively inexpensive, and reliable data for exploitation in wheat breeding. These advances have opened up new opportunities for marker-assisted selection (MAS) and genomic selection (GS) in wheat. Herein, we review the advances and perspectives in wheat genetics and genomics, with a focus on key traits, including grain yield, yield-related traits, end-use quality, and resistance to biotic and abiotic stresses. We also focus on reported candidate genes cloned and linked to traits of interest. Furthermore, we report on the improvement in the aforementioned quantitative traits, through the use of (i) clustered regularly interspaced short-palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9)-mediated gene-editing and (ii) positional cloning methods, and of genomic selection. Finally, we examine the utilization of genomics for the next-generation wheat breeding, providing a practical example of using in silico bioinformatics tools that are based on the wheat reference-genome sequence.
- Keywords
- CRISPR/Cas9, QTL cloning, Wheat, abiotic-stress tolerance, disease resistance, genome-wide association, genomic selection, quantitative trait locus mapping,
- Publication type
- Journal Article MeSH
- Review MeSH
This work describes the method of a selective hydride generation-cryotrapping (HG-CT) coupled to an extremely sensitive but simple in-house assembled and designed atomic fluorescence spectrometry (AFS) instrument for determination of toxicologically important As species. Here, an advanced flame-in-gas-shield atomizer (FIGS) was interfaced to HG-CT and its performance was compared to a standard miniature diffusion flame (MDF) atomizer. A significant improvement both in sensitivity and baseline noise was found that was reflected in improved (4 times) limits of detection (LODs). The yielded LODs with the FIGS atomizer were 0.44, 0.74, 0.15, 0.17 and 0.67 ng L(-1) for arsenite, total inorganic, mono-, dimethylated As and trimethylarsine oxide, respectively. Moreover, the sensitivities with FIGS and MDF were equal for all As species, allowing for the possibility of single species standardization with arsenate standard for accurate quantification of all other As species. The accuracy of HG-CT-AFS with FIGS was verified by speciation analysis in two samples of bottled drinking water and certified reference materials, NRC CASS-5 (nearshore seawater) and SLRS-5 (river water) that contain traces of methylated As species. As speciation was in agreement with results previously reported and sums of all quantified species corresponded with the certified total As. The feasibility of HG-CT-AFS with FIGS was also demonstrated by the speciation analysis in microsamples of exfoliated bladder epithelial cells isolated from human urine. The results for the sums of trivalent and pentavalent As species corresponded well with the reference results obtained by HG-CT-ICPMS (inductively coupled plasma mass spectrometry).
- MeSH
- Arsenic analysis chemistry MeSH
- Chemistry Techniques, Analytical economics instrumentation MeSH
- Spectrometry, Fluorescence standards MeSH
- Limit of Detection MeSH
- Nebulizers and Vaporizers MeSH
- Drinking Water chemistry MeSH
- Spectrophotometry, Atomic standards MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
- Names of Substances
- Arsenic MeSH
- Drinking Water MeSH