short sequence motifs
Dotaz
Zobrazit nápovědu
There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.
MOTIVATION: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins' surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. RESULTS: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. AVAILABILITY AND IMPLEMENTATION: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. CONTACT: muller@mou.cz SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- algoritmy * MeSH
- databáze proteinů * MeSH
- epitopy chemie MeSH
- interakční proteinové domény a motivy * MeSH
- lidé MeSH
- Markovovy řetězce MeSH
- molekulární sekvence - údaje MeSH
- monoklonální protilátky chemie MeSH
- peptidy chemie MeSH
- sekvence aminokyselin MeSH
- sekvenční seřazení MeSH
- shluková analýza MeSH
- software MeSH
- src homologní domény MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Telomeres are nucleoprotein structures that distinguish native chromosomal ends from double-stranded breaks. They are maintained by telomerase that adds short G-rich telomeric repeats at chromosomal ends in most eukaryotes and determines the TnAmGo sequence of canonical telomeres. We employed an experimental approach that was based on detection of repeats added by telomerase to identify the telomere sequence type forming the very ends of chromosomes. Our previous studies that focused on the algal order Chlamydomonadales revealed several changes in telomere motifs that were consistent with the phylogeny and supported the concept of the Arabidopsis-type sequence being the ancestral telomeric motif for green algae. In addition to previously described independent transitions to the Chlamydomonas-type sequence, we report that the ancestral telomeric motif was replaced by the human-type sequence in the majority of algal species grouped within a higher order clade, Caudivolvoxa. The Arabidopsis-type sequence was apparently retained in the Polytominia clade. Regarding the telomere sequence, the Chlorogonia clade within Caudivolvoxa bifurcates into two groups, one with the human-type sequence and the other group with the Arabidopsis-type sequence that is solely formed by the Chlorogonium species. This suggests that reversion to the Arabidopsis-type telomeric motif occurred in the common ancestral Chlorogonium species. The human-type sequence is also synthesized by telomerases of algal strains from Arenicolinia, Dunaliellinia and Stephanosphaerinia, except a distinct subclade within Stephanosphaerinia, where telomerase activity was not detected and a change to an unidentified telomeric motif might arise. We discuss plausible reasons why changes in telomeric motifs were tolerated during evolution of green algae.
- MeSH
- aminokyselinové motivy genetika MeSH
- fylogeneze MeSH
- repetitivní sekvence nukleových kyselin genetika MeSH
- ribozomální DNA genetika MeSH
- RNA ribozomální 18S genetika MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA MeSH
- telomerasa genetika MeSH
- telomery genetika MeSH
- Volvocida genetika MeSH
- zkracování telomer genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Non-canonical (non-B) DNA structures-e.g. bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g. A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies and occupy 9%-15%, 9%-11%, and 12%-38% of autosomes and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
- MeSH
- DNA * chemie genetika MeSH
- G-kvadruplexy MeSH
- genom lidský MeSH
- genom * MeSH
- Hominidae * genetika MeSH
- lidé MeSH
- nukleotidové motivy MeSH
- Pan troglodytes genetika MeSH
- repetitivní sekvence nukleových kyselin MeSH
- telomery * genetika MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
Cytosine-rich DNA regions can form four-stranded structures based on hemi-protonated C.C+ pairs, called i-motifs (iMs). Using CD, UV absorption, NMR spectroscopy, and DSC calorimetry, we show that model (CnT3)3Cn (Cn) sequences adopt iM under neutral or slightly alkaline conditions for n > 3. However, the iMs are formed with long-lasting kinetics under these conditions and melt with significant hysteresis. Sequences with n > 6 melt in two or more separate steps, indicating the presence of different iM species, the proportion of which is dependent on temperature and incubation time. At ambient temperature, kinetically favored iMs of low stability are formed, most likely consisting of short C.C+ blocks. These species act as kinetic traps and prevent the assembly of thermodynamically favored, fully C.C+ paired iMs. A higher temperature is necessary to unfold the kinetic forms and enable their substitution by a slowly developing thermodynamic structure. This complicated kinetic partitioning process considerably slows down iM folding, making it much slower than the timeframes of biological reactions and, therefore, unlikely to have any biological relevance. Our data suggest kinetically driven iM species as more likely to be biologically relevant than thermodynamically most stable iM forms.
Lacertid lizards are a widely radiated group of squamate reptiles with long-term stable ZZ/ZW sex chromosomes. Despite their family-wide homology of Z-specific gene content, previous cytogenetic studies revealed significant variability in the size, morphology, and heterochromatin distribution of their W chromosome. However, there is little evidence about the accumulation and distribution of repetitive content on lacertid chromosomes, especially on their W chromosome. In order to expand our knowledge of the evolution of sex chromosome repetitive content, we examined the topology of telomeric and microsatellite motifs that tend to often accumulate on the sex chromosomes of reptiles in the karyotypes of 15 species of lacertids by fluorescence in situ hybridization (FISH). The topology of the above-mentioned motifs was compared to the pattern of heterochromatin distribution, as revealed by C-banding. Our results show that the topologies of the examined motifs on the W chromosome do not seem to follow a strong phylogenetic signal, indicating independent and species-specific accumulations. In addition, the degeneration of the W chromosome can also affect the Z chromosome and potentially also other parts of the genome. Our study provides solid evidence that the repetitive content of the degenerated sex chromosomes is one of the most evolutionary dynamic parts of the genome.
- MeSH
- chromozomy genetika MeSH
- druhová specificita MeSH
- fylogeneze MeSH
- heterochromatin genetika ultrastruktura MeSH
- hybridizace in situ fluorescenční MeSH
- ještěři genetika MeSH
- karyotyp MeSH
- mikrosatelitní repetice genetika MeSH
- molekulární evoluce * MeSH
- nukleotidové motivy MeSH
- pohlavní chromozomy genetika MeSH
- pruhování chromozomů MeSH
- repetitivní sekvence nukleových kyselin MeSH
- telomery genetika MeSH
- zvířata MeSH
- Check Tag
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- srovnávací studie MeSH
Translational control in eukaryotes is exerted by many means, one of which involves a ribosome translating multiple cistrons per mRNA as in bacteria. It is called reinitiation (REI) and occurs on mRNAs where the main ORF is preceded by a short upstream uORF(s). Some uORFs support efficient REI on downstream cistrons, whereas some others do not. The mRNA of yeast transcriptional activator GCN4 contains four uORFs of both types that together compose an intriguing regulatory mechanism of its expression responding to nutrients' availability and various stresses. Here we subjected all GCN4 uORFs to a comprehensive analysis to identify all REI-promoting and inhibiting cis-determinants that contribute either autonomously or in synergy to the overall efficiency of REI on GCN4. We found that the 3' sequences of uORFs 1-3 contain a conserved AU1-2A/UUAU2 motif that promotes REI in position-specific, autonomous fashion such as the REI-promoting elements occurring in 5' sequences of uORF1 and uORF2. We also identified autonomous and transferable REI-inhibiting elements in the 3' sequences of uORF2 and uORF3, immediately following their AU-rich motif. Furthermore, we analyzed contributions of coding triplets and terminating stop codon tetranucleotides of GCN4 uORFs showing a negative correlation between the efficiency of reinitiation and efficiency of translation termination. Together we provide a complex overview of all cis-determinants of REI with their effects set in the context of the overall GCN4 translational control.
- MeSH
- iniciace translace peptidového řetězce MeSH
- messenger RNA genetika metabolismus MeSH
- otevřené čtecí rámce MeSH
- regulace genové exprese u hub MeSH
- Saccharomyces cerevisiae - proteiny genetika metabolismus MeSH
- Saccharomyces cerevisiae genetika metabolismus MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza RNA MeSH
- transkripční faktory bZIP genetika metabolismus MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
PUF60 is a splicing factor that binds uridine (U)-rich tracts and facilitates association of the U2 small nuclear ribonucleoprotein with primary transcripts. PUF60 deficiency (PD) causes a developmental delay coupled with intellectual disability and spinal, cardiac, ocular and renal defects, but PD pathogenesis is not understood. Using RNA-Seq, we identify human PUF60-regulated exons and show that PUF60 preferentially acts as their activator. PUF60-activated internal exons are enriched for Us upstream of their 3' splice sites (3'ss), are preceded by longer AG dinucleotide exclusion zones and more distant branch sites, with a higher probability of unpaired interactions across a typical branch site location as compared to control exons. In contrast, PUF60-repressed exons show U-depletion with lower estimates of RNA single-strandedness. We also describe PUF60-regulated, alternatively spliced isoforms encoding other U-bound splicing factors, including PUF60 partners, suggesting that they are co-regulated in the cell, and identify PUF60-regulated exons derived from transposed elements. PD-associated amino-acid substitutions, even within a single RNA recognition motif (RRM), altered selection of competing 3'ss and branch points of a PUF60-dependent exon and the 3'ss choice was also influenced by alternative splicing of PUF60. Finally, we propose that differential distribution of RNA processing steps detected in cells lacking PUF60 and the PUF60-paralog RBM39 is due to the RBM39 RS domain interactions. Together, these results provide new insights into regulation of exon usage by the 3'ss organization and reveal that germline mutation heterogeneity in RRMs can enhance phenotypic variability at the level of splice-site and branch-site selection.
- MeSH
- aminokyselinové motivy MeSH
- exony * MeSH
- HEK293 buňky MeSH
- HeLa buňky MeSH
- heterogenní jaderné ribonukleoproteiny metabolismus MeSH
- jaderné proteiny metabolismus MeSH
- krátké rozptýlené jaderné elementy MeSH
- lidé MeSH
- malý jaderný ribonukleoprotein U1 metabolismus MeSH
- missense mutace * MeSH
- místa sestřihu RNA * MeSH
- proteiny vázající RNA metabolismus MeSH
- represorové proteiny chemie nedostatek metabolismus MeSH
- sekvenční analýza RNA MeSH
- sestřihové faktory chemie nedostatek metabolismus MeSH
- sestřihový faktor U2AF MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
G-quadruplexes (G4s) formed within RNA are emerging as promising targets for therapeutic intervention in cancer, neurodegenerative disorders and infectious diseases. Sequences containing a succession of short GG blocks, or uneven G-tract lengths unable to form three-tetrad G4s (GG motifs), are overwhelmingly more frequent than canonical motifs involving multiple GGG blocks. We recently showed that DNA is not able to form stable two-tetrad intramolecular parallel G4s. Whether RNA GG motifs can form intramolecular G4s under physiological conditions and play regulatory roles remains a burning question. In this study, we performed a systematic analysis and experimental evaluation of a number of biologically important RNA regions involving RNA GG motifs. We show that most of these motifs do not form stable intramolecular G4s but need to dimerize to form stable G4 structures. The strong tendency of RNA GG motif G4s to associate may participate in RNA-based aggregation under conditions of cellular stress.
- MeSH
- dimerizace MeSH
- G-kvadruplexy * MeSH
- genetická transkripce MeSH
- lidé MeSH
- nukleotidové motivy * MeSH
- RNA * chemie metabolismus genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Based on the matrix-addressing sequence of mitochondrial ribosomal 5S-rRNA (termed MAM), which is naturally imported into mitochondria, we have constructed an import system for in vivo targeting of mitochondrial DNA (mtDNA) or mt-mRNA, in order to provide fluorescence hybridization of the desired sequences. Thus DNA oligonucleotides were constructed, containing the 5'-flanked T7 RNA polymerase promoter. After in vitro transcription and fluorescent labeling with Alexa Fluor(®) 488 or 647 dye, we obtained the fluorescent "L-ND5 probe" containing MAM and exemplar cargo, i.e., annealing sequence to a short portion of ND5 mRNA and to the light-strand mtDNA complementary to the heavy strand nd5 mt gene (5'-end 21 base pair sequence). For mitochondrial in vivo fluorescent hybridization, HepG2 cells were treated with dequalinium micelles, containing the fluorescent probes, bringing the probes proximally to the mitochondrial outer membrane and to the natural import system. A verification of import into the mitochondrial matrix of cultured HepG2 cells was provided by confocal microscopy colocalizations. Transfections using lipofectamine or probes without 5S-rRNA addressing MAM sequence or with MAM only were ineffective. Alternatively, the same DNA oligonucleotides with 5'-CACC overhang (substituting T7 promoter) were transcribed from the tetracycline-inducible pENTRH1/TO vector in human embryonic kidney T-REx®-293 cells, while mitochondrial matrix localization after import of the resulting unlabeled RNA was detected by PCR. The MAM-containing probe was then enriched by three-order of magnitude over the natural ND5 mRNA in the mitochondrial matrix. In conclusion, we present a proof-of-principle for mitochondrial in vivo hybridization and mitochondrial nucleic acid import.
- MeSH
- genetická transkripce MeSH
- lidé MeSH
- mitochondriální DNA chemie genetika MeSH
- nukleové kyseliny chemie genetika MeSH
- oligonukleotidy chemie genetika MeSH
- RNA ribozomální chemie genetika MeSH
- RNA chemie genetika MeSH
- sekvenční homologie nukleových kyselin MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH