sequence assembly
Dotaz
Zobrazit nápovědu
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules.
- MeSH
- biotechnologie metody MeSH
- chromozomy rostlin genetika MeSH
- genom rostlinný * MeSH
- mapování chromozomů metody MeSH
- pšenice genetika MeSH
- sekvenční analýza DNA metody MeSH
- tandemové repetitivní sekvence MeSH
- umělé bakteriální chromozomy MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Assembly of immature retroviral particles is a complex process involving interactions of several specific domains of the Gag polyprotein localized mainly within capsid protein (CA), spacer peptide (SP), and nucleocapsid protein (NC). In the present work we focus on the contribution of NC to the oligomerization of CA leading to assembly of Mason-Pfizer monkey virus (M-PMV) and HIV-1. Analyzing in vitro assembly of substitution and deletion mutants of DeltaProCANC, we identified a "spacer-like" sequence (NC(15)) at the M-PMV NC N terminus. This NC(15) domain is indispensable for the assembly and cannot be replaced with oligomerization domains of GCN4 or CREB proteins. Although the M-PMV NC(15) occupies a position analogous to that of the HIV-1 spacer peptide, it could not be replaced by the latter one. To induce the assembly, both M-PMV NC(15) and HIV-1 SP1 must be followed by a short peptide that is rich in basic residues. This region either can be specific, i.e., derived from the downstream NC sequence, or can be a nonspecific positively charged peptide. However, it cannot be replaced by heterologous interaction domains either from GCN4 or from CREB. In summary, we report here a novel M-PMV spacer-like domain that is functionally similar to other retroviral spacer peptides and contributes to the assembly of immature-virus-like particles.
- MeSH
- buněčné linie MeSH
- DNA primery genetika MeSH
- DNA virů genetika MeSH
- Escherichia coli genetika ultrastruktura virologie MeSH
- HIV-1 fyziologie genetika MeSH
- lidé MeSH
- Masonův-Pfizerův opičí virus fyziologie genetika ultrastruktura MeSH
- molekulární sekvence - údaje MeSH
- multimerizace proteinu MeSH
- mutageneze MeSH
- nukleokapsida - proteiny fyziologie genetika chemie MeSH
- rekombinantní proteiny genetika chemie metabolismus MeSH
- sekvence aminokyselin MeSH
- sekvence nukleotidů MeSH
- sekvenční homologie aminokyselin MeSH
- sestavení viru fyziologie genetika MeSH
- terciární struktura proteinů MeSH
- transmisní elektronová mikroskopie MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
Immature retroviral particles are assembled by self-association of the structural polyprotein precursor Gag. During maturation the Gag polyprotein is proteolytically cleaved, yielding mature structural proteins, matrix (MA), capsid (CA), and nucleocapsid (NC), that reassemble into a mature viral particle. Proteolytic cleavage causes the N terminus of CA to fold back to form a β-hairpin, anchored by an internal salt bridge between the N-terminal proline and the inner aspartate. Using an in vitro assembly system of capsid-nucleocapsid protein (CANC), we studied the formation of virus-like particles (VLP) of a gammaretrovirus, the xenotropic murine leukemia virus (MLV)-related virus (XMRV). We show here that, unlike other retroviruses, XMRV CA and CANC do not assemble tubular particles characteristic of mature assembly. The prevention of β-hairpin formation by the deletion of either the N-terminal proline or 10 initial amino acids enabled the assembly of ΔProCANC or Δ10CANC into immature-like spherical particles. Detailed three-dimensional (3D) structural analysis of these particles revealed that below a disordered N-terminal CA layer, the C terminus of CA assembles a typical immature lattice, which is linked by rod-like densities with the RNP.
- MeSH
- DNA primery MeSH
- elektronová kryomikroskopie MeSH
- Escherichia coli ultrastruktura virologie MeSH
- Fourierova analýza MeSH
- molekulární sekvence - údaje MeSH
- polymerázová řetězová reakce MeSH
- proteolýza MeSH
- sekvence aminokyselin MeSH
- sekvence nukleotidů MeSH
- sekvenční homologie aminokyselin MeSH
- sestavení viru MeSH
- transmisní elektronová mikroskopie MeSH
- virion fyziologie MeSH
- virové proteiny chemie metabolismus MeSH
- virus myší leukemie fyziologie MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
Chromatin Assembly Factor 1 (CAF-1) is a major nucleosome assembly complex which functions particularly during DNA replication and repair. Here we studied how the nucleosome landscape changes in a CAF-1 mutant in the model plant Arabidopsis thaliana. Globally, most nucleosomes were not affected by loss of CAF-1, indicating the presence of efficient alternative nucleosome assemblers. Nucleosomes that we found depleted in the CAF-1 mutant were enriched in non-transcribed regions, consistent with the notion that CAF-1-independent nucleosome assembly can compensate for loss of CAF-1 mainly in transcribed regions. Depleted nucleosomes were particularly enriched in proximal promoters, suggesting that CAF-1-independent nucleosome assembly mechanisms are often not efficient upstream of transcription start sites. Genes related to plant defense were particularly prone to lose nucleosomes in their promoters upon CAF-1 depletion. Reduced nucleosome occupancy at promoters of many defense-related genes is associated with a primed gene expression state that may considerably increase plant fitness by facilitating plant defense. Together, our results establish that the nucleosome landscape in Arabidopsis is surprisingly robust even in the absence of the dedicated nucleosome assembly machinery CAF-1 and that CAF-1-independent nucleosome assembly mechanisms are less efficient in particular genome regions.
- MeSH
- Arabidopsis genetika imunologie metabolismus MeSH
- chromatin genetika MeSH
- faktor 1 pro uspořádání chromatinu genetika metabolismus MeSH
- imunita rostlin genetika MeSH
- mutace MeSH
- nukleozomy genetika metabolismus MeSH
- oprava DNA genetika MeSH
- počátek transkripce MeSH
- promotorové oblasti (genetika) genetika MeSH
- replikace DNA genetika MeSH
- restrukturace chromatinu genetika MeSH
- sekvenční analýza DNA MeSH
- Publikační typ
- časopisecké články MeSH
Next-generation sequencing (NGS) provides a powerful tool for the discovery of important genes and alleles in crop plants and their wild relatives. Despite great advances in NGS technologies, whole-genome shotgun sequencing is cost-prohibitive for species with complex genomes. An attractive option is to reduce genome complexity to a single chromosome prior to sequencing. This work describes a strategy for studying the genomes of distant wild relatives of wheat by isolating single chromosomes from addition or substitution lines, followed by chromosome sorting using flow cytometry and sequencing of chromosomal DNA by NGS technology. We flow-sorted chromosome 5M(g) from a wheat/Aegilops geniculata disomic substitution line [DS5M(g) (5D)] and sequenced it using an Illumina HiSeq 2000 system at approximately 50 × coverage. Paired-end sequences were assembled and used for structural and functional annotation. A total of 4236 genes were annotated on 5M(g) , in close agreement with the predicted number of genes on wheat chromosome 5D (4286). Single-gene FISH indicated no major chromosomal rearrangements between chromosomes 5M(g) and 5D. Comparing chromosome 5M(g) with model grass genomes identified synteny blocks in Brachypodium distachyon, rice (Oryza sativa), sorghum (Sorghum bicolor) and barley (Hordeum vulgare). Chromosome 5M(g) -specific SNPs and cytogenetic probe-based resources were developed and validated. Deletion bin-mapped and ordered 5M(g) SNP markers will be useful to track 5M-specific introgressions and translocations. This study provides a detailed sequence-based analysis of the composition of a chromosome from a distant wild relative of bread wheat, and opens up opportunities to develop genomic resources for wild germplasm to facilitate crop improvement.
- MeSH
- Brachypodium genetika MeSH
- chromozomy rostlin genetika MeSH
- genom rostlinný genetika MeSH
- hybridizace in situ fluorescenční MeSH
- ječmen (rod) genetika MeSH
- jednonukleotidový polymorfismus MeSH
- lipnicovité klasifikace genetika MeSH
- mapování chromozomů MeSH
- molekulární evoluce MeSH
- pořadí genů MeSH
- pšenice genetika MeSH
- rostlinné geny genetika MeSH
- rýže (rod) genetika MeSH
- Sorghum genetika MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
Phage tail fibres are elongated protein assemblies capable of specific recognition of bacterial surfaces during the first step of viral infection1-4. The folding of these complex trimeric structures often requires a phage-encoded tail fibre assembly (Tfa) protein5-7. Despite the wide occurrence of Tfa proteins, their functional mechanism has not been elucidated. Here, we investigate the tail fibre and Tfa of Escherichia coli phage Mu. We demonstrate that Tfa forms a stable complex with the tail fibre, and present a 2.1 Å resolution X-ray crystal structure of this complex. We find that Tfa proteins are comprised of two domains: a non-conserved N-terminal domain that binds to the C-terminal region of the fibre and a conserved C-terminal domain that probably mediates fibre oligomerization and assembly. Tfa forms rapidly exchanging multimers on its own, but not a stable trimer, implying that Tfa does not specify the trimeric state of the fibre. We propose that the key conserved role of Tfa is to ensure that fibre assembly and multimerization initiates at the C terminus, ensuring that the intertwined and repetitive structural elements of fibres come together in the correct sequence. The universal importance of correctly aligning the C termini of phage fibres is highlighted by our work.
- MeSH
- bakteriofágy klasifikace fyziologie MeSH
- Escherichia coli metabolismus virologie MeSH
- krystalografie rentgenová MeSH
- molekulární modely MeSH
- multimerizace proteinu MeSH
- proteiny virových bičíků chemie genetika metabolismus MeSH
- sbalování proteinů MeSH
- sekvence aminokyselin MeSH
- sekvenční seřazení MeSH
- vazba proteinů MeSH
- virové proteiny chemie genetika metabolismus MeSH
- vztahy mezi strukturou a aktivitou MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Next-generation sequencing methods provide comprehensive data for the analysis of structural and functional analysis of the genome. The draft genomes with low contig number and high N50 value can give insight into the structure of the genome as well as provide information on the annotation of the genome. In this study, we designed a pipeline that can be used to assemble prokaryotic draft genomes with low number of contigs and high N50 value. We aimed to use combination of two de novo assembly tools (SPAdes and IDBA-Hybrid) and evaluate the impact of this approach on the quality metrics of the assemblies. The followed pipeline was tested with the raw sequence data with short reads (< 300) for a total of 10 species from four different genera. To obtain the final draft genomes, we firstly assembled the sequences using SPAdes to find closely related organism using the extracted 16 s rRNA from it. IDBA-Hybrid assembler was used to obtain the second assembly data using the closely related organism genome. SPAdes assembler tool was implemented using the second assembly, produced by IDBA-hybrid as a hint. The results were evaluated using QUAST and BUSCO. The pipeline was successful for the reduction of the contig numbers and increasing the N50 statistical values in the draft genome assemblies while preserving the coverage of the draft genomes.
- MeSH
- sekvenční analýza DNA metody MeSH
- vysoce účinné nukleotidové sekvenování * metody MeSH
- Publikační typ
- časopisecké články MeSH
Taking advantage of evolving and improving sequencing methods, human chromosome 8 is now available as a gapless, end-to-end assembly. Thanks to advances in long-read sequencing technologies, its centromere, telomeres, duplicated gene families and repeat-rich regions are now fully sequenced. We were interested to assess if the new assembly altered our understanding of the potential impact of non-B DNA structures within this completed chromosome sequence. It has been shown that non-B secondary structures, such as G-quadruplexes, hairpins and cruciforms, have important regulatory functions and potential as targeted therapeutics. Therefore, we analysed the presence of putative G-quadruplex forming sequences and inverted repeats in the current human reference genome (GRCh38) and in the new end-to-end assembly of chromosome 8. The comparison revealed that the new assembly contains significantly more inverted repeats and G-quadruplex forming sequences compared to the current reference sequence. This observation can be explained by improved accuracy of the new sequencing methods, particularly in regions that contain extensive repeats of bases, as is preferred by many non-B DNA structures. These results show a significant underestimation of the prevalence of non-B DNA secondary structure in previous assembly versions of the human genome and point to their importance being not fully appreciated. We anticipate that similar observations will occur as the improved sequencing technologies fill in gaps across the genomes of humans and other organisms.
- MeSH
- G-kvadruplexy * MeSH
- genom lidský MeSH
- inverze sekvence * MeSH
- lidé MeSH
- lidské chromozomy, pár 8 * MeSH
- sekvenční analýza DNA MeSH
- telomery * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
UNLABELLED: MoTIVATION: Pairwise sequence alignment has received a new motivation due to the advent of recent patents in next-generation sequencing technologies, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important problem is to find the alignment between a relatively short succeeding factor of the reference sequence and the remaining low-quality part of the read allowing a number of mismatches and the insertion of a single gap in the alignment. RESULTS: We present GapMis, a tool for pairwise sequence alignment with a single gap. It is based on a simple algorithm, which computes a different version of the traditional dynamic programming matrix. The presented experimental results demonstrate that GapMis is more suitable and efficient than most popular tools for this task.
Lectins with a β-propeller fold bind glycans on the cell surface through multivalent binding sites and appropriate directionality. These proteins are formed by repeats of short domains, raising questions about evolutionary duplication. However, these repeats are difficult to detect in translated genomes and seldom correctly annotated in sequence databases. To address these issues, we defined the blade signature of the five types of β-propellers using 3D-structural data. With these templates, we predicted 3,887 β-propeller lectins in 1,889 species and organized this information in a searchable online database. The data reveal a widespread distribution of β-propeller lectins across species. Prediction also emphasizes multiple architectures and led to the discovery of a β-propeller assembly scenario. This was confirmed by producing and characterizing a predicted protein coded in the genome of Kordia zhangzhouensis. The crystal structure uncovers an intermediate in the evolution of β-propeller assembly and demonstrates the power of our tools.
- MeSH
- Archaea chemie MeSH
- Bacteria chemie MeSH
- databáze proteinů MeSH
- Eukaryota chemie MeSH
- genom bakteriální MeSH
- lektiny chemie MeSH
- molekulární modely MeSH
- multimerizace proteinu MeSH
- proteom MeSH
- sbalování proteinů MeSH
- sekundární struktura proteinů MeSH
- sekvence aminokyselin MeSH
- sekvenční seřazení MeSH
- vazebná místa MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH