clustering analysis
Dotaz
Zobrazit nápovědu
BACKGROUND: Ubiquitin ligases (Ub-ligases) are essential intracellular enzymes responsible for the regulation of proteome homeostasis, signaling pathway crosstalk, cell differentiation and stress responses. Individual Ub-ligases exhibit their unique functions based on the nature of their substrates. They create a complex regulatory network with alternative and feedback pathways to maintain cell homeostasis, being thus important players in many physiological and pathological conditions. However, the functional classification of Ub-ligases needs to be revised and extended. METHODS: In the current study, we used a novel semantic biclustering technique for expression profiling of Ub-ligases and ubiquitination-related genes in the murine gastrointestinal tract (GIT). We accommodated a general framework of the algorithm for finding tissue-specific gene expression clusters in GIT. In order to test identified clusters in a biological system, we used a model of epithelial regeneration. For this purpose, a dextran sulfate sodium (DSS) mouse model, following with in situ hybridization, was used to expose genes with possible compensatory features. To determine cell-type specific distribution of Ub-ligases and ubiquitination-related genes, principal component analysis (PCA) and Uniform Manifold Approximation and Projection technique (UMAP) were used to analyze the Tabula Muris scRNA-seq data of murine colon followed by comparison with our clustering results. RESULTS: Our established clustering protocol, that incorporates the semantic biclustering algorithm, demonstrated the potential to reveal interesting expression patterns. In this manner, we statistically defined gene clusters consisting of the same genes involved in distinct regulatory pathways vs distinct genes playing roles in functionally similar signaling pathways. This allowed us to uncover the potentially redundant features of GIT-specific Ub-ligases and ubiquitination-related genes. Testing the statistically obtained results on the mouse model showed that genes clustered to the same ontology group simultaneously alter their expression pattern after induced epithelial damage, illustrating their complementary role during tissue regeneration. CONCLUSIONS: An optimized semantic clustering protocol demonstrates the potential to reveal a readable and unique pattern in the expression profiling of GIT-specific Ub-ligases, exposing ontologically relevant gene clusters with potentially redundant features. This extends our knowledge of ontological relationships among Ub-ligases and ubiquitination-related genes, providing an alternative and more functional gene classification. In a similar way, semantic cluster analysis could be used for studding of other enzyme families, tissues and systems.
- MeSH
- gastrointestinální trakt metabolismus MeSH
- lidé MeSH
- myši MeSH
- sémantika * MeSH
- shluková analýza MeSH
- ubikvitin genetika metabolismus MeSH
- ubikvitinligasy * genetika MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- myši MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
The availability of a great range of prior biological knowledge about the roles and functions of genes and gene-gene interactions allows us to simplify the analysis of gene expression data to make it more robust, compact, and interpretable. Here, we objectively analyze the applicability of functional clustering for the identification of groups of functionally related genes. The analysis is performed in terms of gene expression classification and uses predictive accuracy as an unbiased performance measure. Features of biological samples that originally corresponded to genes are replaced by features that correspond to the centroids of the gene clusters and are then used for classifier learning. Using 10 benchmark data sets, we demonstrate that functional clustering significantly outperforms random clustering without biological relevance. We also show that functional clustering performs comparably to gene expression clustering, which groups genes according to the similarity of their expression profiles. Finally, the suitability of functional clustering as a feature extraction technique is evaluated and discussed.
To understand how genes are distributed on chromosomes we bring new insights into gene positional clustering and its properties. We have made a large-scale analysis of three types of differentiation and we observed that genes that subsequently enter into different cell processes are positionally clustered on chromosomes. Genes from the clusters are transcribed subsequently with respect to time kinetics and also to position. This means that the genes related to a cellular process are clustered together, independent of the period of time during which they are active and important for the process. Our results also demonstrate not only that there are general regions of increased or decreased levels of gene expression, but also that, in fact, in some chromosome regions we can find clustering of genes related to specific cell processes. The results provided in this paper also support the theory of "transcription factories" and show that transcription of genes from the clusters is managed by softer epigenetic mechanisms.
- MeSH
- buněčná diferenciace genetika MeSH
- buňky K562 MeSH
- DNA primery genetika MeSH
- financování organizované MeSH
- genetická transkripce MeSH
- granulocyty cytologie metabolismus MeSH
- HL-60 buňky MeSH
- lidé MeSH
- megakaryocyty cytologie metabolismus MeSH
- monocyty cytologie metabolismus MeSH
- multigenová rodina MeSH
- myelopoéza genetika MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza hybridizací s uspořádaným souborem oligonukleotidů MeSH
- stanovení celkové genové exprese MeSH
- trombopoéza genetika MeSH
- Check Tag
- lidé MeSH
On the basis of analyse results of athletic decathlion we want to show more substantial statistical objectivity factor analysis in comparsion with cluster analysis. Factor analysis presets even after the rotation the same results. Bath actions is necessary completes the characterizations mather-of-fact signification.
- MeSH
- běh fyziologie statistika a číselné údaje MeSH
- faktorová analýza statistická MeSH
- kineziologie aplikovaná metody statistika a číselné údaje využití MeSH
- lehká atletika fyziologie statistika a číselné údaje MeSH
- lidé MeSH
- muži MeSH
- shluková analýza MeSH
- výkonnost fyziologie MeSH
- Check Tag
- lidé MeSH
Analysis of population genetic structure has become a standard approach in population genetics. In polyploid complexes, clustering analyses can elucidate the origin of polyploid populations and patterns of admixture between different cytotypes. However, combining diploid and polyploid data can theoretically lead to biased inference with (artefactual) clustering by ploidy. We used simulated mixed-ploidy (diploid-autotetraploid) data to systematically compare the performance of k-means clustering and the model-based clustering methods implemented in STRUCTURE, ADMIXTURE, FASTSTRUCTURE and INSTRUCT under different scenarios of differentiation and with different marker types. Under scenarios of strong population differentiation, the tested applications performed equally well. However, when population differentiation was weak, STRUCTURE was the only method that allowed unbiased inference with markers with limited genotypic information (co-dominant markers with unknown dosage or dominant markers). Still, since STRUCTURE was comparatively slow, the much faster but less powerful FASTSTRUCTURE provides a reasonable alternative for large datasets. Finally, although bias makes k-means clustering unsuitable for markers with incomplete genotype information, for large numbers of loci (>1000) with known dosage k-means clustering was superior to FASTSTRUCTURE in terms of power and speed. We conclude that STRUCTURE is the most robust method for the analysis of genetic structure in mixed-ploidy populations, although alternative methods should be considered under some specific conditions.
- MeSH
- diploidie MeSH
- genetická variace genetika MeSH
- genetické markery genetika MeSH
- genotyp MeSH
- jednonukleotidový polymorfismus genetika MeSH
- mikrosatelitní repetice genetika MeSH
- ploidie * MeSH
- populační genetika statistika a číselné údaje MeSH
- shluková analýza MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.