sequence capture Dotaz Zobrazit nápovědu
Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.
- MeSH
- buněčné jádro genetika MeSH
- centromera genetika MeSH
- chromatin genetika metabolismus MeSH
- chromozomy rostlin genetika MeSH
- genetická variace MeSH
- genom rostlinný genetika MeSH
- genomika MeSH
- haplotypy genetika MeSH
- ječmen (rod) genetika MeSH
- mapování chromozomů MeSH
- meióza genetika MeSH
- repetitivní sekvence nukleových kyselin genetika MeSH
- semena rostlinná genetika MeSH
- umělé bakteriální chromozomy genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Názvy látek
- chromatin MeSH
As whole-genome sequencing has become pervasive, some have suggested that reduced genomic representation approaches, for example, sequence capture, are becoming obsolete. In the present study, we argue that these techniques still provide excellent tools in terms of price and quality of data as well as in their ability to provide markers with specific features, as required, for example, in phylogenomics. A potential drawback of the wide-scale application of reduced representation approaches could be their drop in efficiency with increasing phylogenetic distance from the reference species. While some studies have focused on the degree and performance of reduced representation techniques in such situations, to our knowledge, none of them evaluated their applicability to inter-specific hybrids and polyploids. This highlights a significant gap in current knowledge since there is increasing evidence for the frequent occurrence of natural hybrids and polyploids, as well as for the major importance of both phenomena in evolution. The main aim of the present study was to carry out a thorough validation of SEQcap applicability to (1) a set of non-model taxa with a wide range of phylogenetic relatedness and (2) inter-specific hybrids of various ploidies and genomic compositions. Considering the latter point, we especially focused on mechanisms causing allelic bias and consequent allelic dropout, as these could have confounding effects with respect to the evolutionary genomic dynamics of hybrids, especially in asexuals, which virtually reproduce as a frozen F1 generation.
- Klíčová slova
- Cobitis, allelic drop-out, allopolyploids, hybrids, phylogenomics, sequence capture,
- MeSH
- fylogeneze MeSH
- genom * MeSH
- genomika MeSH
- lidé MeSH
- ploidie MeSH
- polyploidie * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
The increasing availability of short-read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short-read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low-coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein-coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.
- Klíčová slova
- secapr, de novo assembly, loci extraction, low-coverage whole genome sequencing, target sequence capture,
- MeSH
- fylogeneze MeSH
- genom * MeSH
- sekvenování celého genomu MeSH
- výpočetní biologie * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
High-throughput DNA sequencing techniques enable time- and cost-effective sequencing of large portions of the genome. Instead of sequencing and annotating whole genomes, many phylogenetic studies focus sequencing effort on large sets of pre-selected loci, which further reduces costs and bioinformatic challenges while increasing coverage. One common approach that enriches loci before sequencing is often referred to as target sequence capture. This technique has been shown to be applicable to phylogenetic studies of greatly varying evolutionary depth. Moreover, it has proven to produce powerful, large multi-locus DNA sequence datasets suitable for phylogenetic analyses. However, target capture requires careful considerations, which may greatly affect the success of experiments. Here we provide a simple flowchart for designing phylogenomic target capture experiments. We discuss necessary decisions from the identification of target loci to the final bioinformatic processing of sequence data. We outline challenges and solutions related to the taxonomic scope, sample quality, and available genomic resources of target capture projects. We hope this review will serve as a useful roadmap for designing and carrying out successful phylogenetic target capture studies.
- Klíčová slova
- Hyb-Seq, Illumina, NGS, anchored enrichment, bait, high throughput sequencing, molecular phylogenetics, probe,
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
BACKGROUND: Retroelements (REs) occupy a significant part of all eukaryotic genomes including humans. The majority of retroelements in the human genome are inactive and unable to retrotranspose. Dozens of active copies are repressed in most normal tissues by various cellular mechanisms. These copies can become active in normal germline and brain tissues or in cancer, leading to new retroposition events. The consequences of such events and their role in normal cell functioning and carcinogenesis are not yet fully understood. If new insertions occur in a small portion of cells they can be found only with the use of specific methods based on RE enrichment and high-throughput sequencing. The downside of the high sensitivity of such methods is the presence of various artifacts imitating real insertions, which in many cases cannot be validated due to lack of the initial template DNA. For this reason, adequate assessment of rare (< 1%) subclonal cancer specific RE insertions is complicated. RESULTS: Here we describe a new copy-capture technique which we implemented in a method called SeqURE for Sequencing Unknown of Retroposition Events that allows for efficient and reliable identification of new genomic RE insertions. The method is based on the capture of copies of target molecules (copy-capture), selective amplification and sequencing of genomic regions adjacent to active RE insertions from both sides. Importantly, the template genomic DNA remains intact and can be used for validation experiments. In addition, we applied a novel system for testing method sensitivity and precisely showed the ability of the developed method to reliably detect insertions present in 1 out of 100 cells and a substantial portion of insertions present in 1 out of 1000 cells. Using advantages of the method we showed the absence of somatic Alu insertions in colorectal cancer samples bearing tumor-specific L1HS insertions. CONCLUSIONS: This study presents the first description and implementation of the copy-capture technique and provides the first methodological basis for the quantitative assessment of RE insertions present in a small portion of cells.
- Klíčová slova
- Copy capture, High-throughput sequencing, Human genome, Insertional polymorphism, Retroelements,
- Publikační typ
- časopisecké články MeSH
Target sequence capture has emerged as a powerful method to sequence hundreds or thousands of genomic regions in a cost- and time-efficient approach. In most cases, however, targeted regions lack full sequence information for certain samples, due to taxonomic, laboratory, or stochastic factors. Loci lacking molecular data for a large number of samples are commonly excluded from downstream analyses, even though they may still contain valuable information. On the other hand, including data-poor loci may bias phylogenetic analyses. Here we use a target sequence capture dataset of an ecologically and taxonomically diverse group of spiny sunflowers (Asteraceae, or Compositae: Barnadesioideae) to test how the inclusion or exclusion of such data-poor loci affects phylogenetic inference. We investigate the sensitivity of concatenation and coalescent approaches to missing data with matrices of varying taxonomic completeness by filtering loci with different proportions of missing samples prior to data analysis. We find that missing data affect both the topology and branch support of the resulting phylogenies. The matrix containing all loci yielded the overall highest node support values, independently of the amount of missing nucleotides. These results provide empirical support to earlier suggestions based on single genes and data simulations that taxa with high amounts of missing data should not be readily dismissed as they can provide essential information for phylogenomic reconstruction.
- Klíčová slova
- Asteraceae, High-throughput sequencing, Missing data, Museomics, Phylogenomics,
- MeSH
- analýza dat MeSH
- Asteraceae * genetika MeSH
- fylogeneze MeSH
- genom MeSH
- genomika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Transposable elements (TEs) regularly capture fragments of genes. When the host silences these TEs, siRNAs homologous to the captured regions may also target the genes. This epigenetic crosstalk establishes an intragenomic conflict: silencing the TEs has the cost of silencing the genes. If genes are important, however, natural selection may maintain function by moderating the silencing response, which may also advantage the TEs. In this study, we examined this model by focusing on Helitrons, Pack-MULEs, and Sirevirus LTR retrotransposons in the maize genome. We documented 1263 TEs containing exon fragments from 1629 donor genes. Consistent with epigenetic conflict, donor genes mapped more siRNAs and were more methylated than genes with no evidence of capture. However, these patterns differed between syntelog versus translocated donor genes. Syntelogs appeared to maintain function, as measured by gene expression, consistent with moderation of silencing for functionally important genes. Epigenetic marks did not spread beyond their captured regions and 24nt crosstalk siRNAs were linked with CHH methylation. Translocated genes, in contrast, bore the signature of silencing. They were highly methylated and less expressed, but also overrepresented among donor genes and located away from chromosomal arms, which suggests a link between capture and gene movement. Splitting genes into potential functional categories based on evolutionary constraint supported the synteny-based findings. TE families captured genes in different ways, but the evidence for their advantage was generally less obvious; nevertheless, TEs with captured fragments were older, mapped fewer siRNAs, and were slightly less methylated than TEs without captured fragments. Collectively, our results argue that TE capture triggers an intragenomic conflict that may not affect the function of important genes but may lead to the pseudogenization of less-constrained genes.
- Klíčová slova
- epigenetic silencing, gene capture, intragenomic conflict, methylation, synteny, transposable elements,
- MeSH
- epigeneze genetická * MeSH
- kukuřice setá genetika MeSH
- malá interferující RNA metabolismus MeSH
- metylace DNA genetika MeSH
- modely genetické MeSH
- regulace genové exprese u rostlin MeSH
- rostlinné geny MeSH
- stanovení celkové genové exprese MeSH
- syntenie genetika MeSH
- transpozibilní elementy DNA genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Názvy látek
- malá interferující RNA MeSH
- transpozibilní elementy DNA MeSH
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
- MeSH
- anotace sekvence MeSH
- genom rostlinný MeSH
- genomika metody MeSH
- intergenová DNA MeSH
- ječmen (rod) genetika MeSH
- koncové repetice MeSH
- retroelementy MeSH
- sekvenční analýza DNA MeSH
- výpočetní biologie metody MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- intergenová DNA MeSH
- retroelementy MeSH
Chromosomal inversions occur in natural populations of many species, and may underlie reproductive isolation and local adaptation. Traditional methods of inversion discovery are labor-intensive and lack sensitivity. Here, we report the use of three-dimensional contact probabilities between genomic loci as assayed by chromosome-conformation capture sequencing (Hi-C) to detect multi-megabase polymorphic inversions in four barley genotypes. Inversions are validated by fluorescence in situ hybridization and Bionano optical mapping. We propose Hi-C as a generally applicable method for inversion discovery in natural populations.
- Klíčová slova
- Hordeum vulgare, barley, chromosomal inversions, chromosome conformation capture sequencing, genomics, optical mapping, technical advance,
- MeSH
- chromozomální inverze genetika MeSH
- chromozomy rostlin genetika MeSH
- genom rostlinný genetika MeSH
- genotyp MeSH
- hybridizace in situ fluorescenční MeSH
- ječmen (rod) genetika MeSH
- mapování chromozomů MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The North American ecological species Daphniapulicaria and Daphniapulex are thought to have diverged from a common ancestor by adaptation to sympatric but ecologically distinct lake and pond habitats respectively. Based on mtDNA relationships, European D. pulicaria is considered a different species only distantly related to its North American counterpart, but both species share a lactate dehydrogenase (Ldh) allele F supposedly involved in lake adaptation in North America, and the same allele is also carried by the related Holarctic Daphniatenebrosa. The correct inference of the species' ancestral relationships is therefore critical for understanding the origin of their adaptive divergence. Our species tree inferred from unlinked nuclear loci for D. pulicaria and D. pulex resolved the European and North American D. pulicaria as sister clades, and we argue that the discordant mtDNA gene tree is best explained by capture of D. pulex mtDNA by D. pulicaria in North America. The Ldh gene tree shows that F-class alleles in D. pulicaria and D. tenebrosa are due to common descent (as opposed to introgression), with D. tenebrosa alleles paraphyletic with respect to D. pulicaria alleles. That D. tenebrosa still segregates the ancestral and derived amino acids at the two sites distinguishing the pond and lake alleles suggests that D. pulicaria inherited the derived states from the D. tenebrosa ancestry. Our results suggest that some adaptations restricting the gene flow between D. pulicaria and D. pulex might have evolved in response to selection in ancestral environments rather than in the species' current sympatric habitats. The Arctic (D. tenebrosa) populations are likely to provide important clues about these issues.
- MeSH
- Daphnia klasifikace genetika MeSH
- druhová specificita MeSH
- fylogeneze MeSH
- fylogeografie MeSH
- mitochondriální DNA chemie MeSH
- rekombinace genetická MeSH
- sekvenční analýza DNA MeSH
- tok genů MeSH
- vznik druhů (genetika) * MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- mitochondriální DNA MeSH