BACKGROUND: Ubiquitin ligases (Ub-ligases) are essential intracellular enzymes responsible for the regulation of proteome homeostasis, signaling pathway crosstalk, cell differentiation and stress responses. Individual Ub-ligases exhibit their unique functions based on the nature of their substrates. They create a complex regulatory network with alternative and feedback pathways to maintain cell homeostasis, being thus important players in many physiological and pathological conditions. However, the functional classification of Ub-ligases needs to be revised and extended. METHODS: In the current study, we used a novel semantic biclustering technique for expression profiling of Ub-ligases and ubiquitination-related genes in the murine gastrointestinal tract (GIT). We accommodated a general framework of the algorithm for finding tissue-specific gene expression clusters in GIT. In order to test identified clusters in a biological system, we used a model of epithelial regeneration. For this purpose, a dextran sulfate sodium (DSS) mouse model, following with in situ hybridization, was used to expose genes with possible compensatory features. To determine cell-type specific distribution of Ub-ligases and ubiquitination-related genes, principal component analysis (PCA) and Uniform Manifold Approximation and Projection technique (UMAP) were used to analyze the Tabula Muris scRNA-seq data of murine colon followed by comparison with our clustering results. RESULTS: Our established clustering protocol, that incorporates the semantic biclustering algorithm, demonstrated the potential to reveal interesting expression patterns. In this manner, we statistically defined gene clusters consisting of the same genes involved in distinct regulatory pathways vs distinct genes playing roles in functionally similar signaling pathways. This allowed us to uncover the potentially redundant features of GIT-specific Ub-ligases and ubiquitination-related genes. Testing the statistically obtained results on the mouse model showed that genes clustered to the same ontology group simultaneously alter their expression pattern after induced epithelial damage, illustrating their complementary role during tissue regeneration. CONCLUSIONS: An optimized semantic clustering protocol demonstrates the potential to reveal a readable and unique pattern in the expression profiling of GIT-specific Ub-ligases, exposing ontologically relevant gene clusters with potentially redundant features. This extends our knowledge of ontological relationships among Ub-ligases and ubiquitination-related genes, providing an alternative and more functional gene classification. In a similar way, semantic cluster analysis could be used for studding of other enzyme families, tissues and systems.
- MeSH
- Gastrointestinal Tract metabolism MeSH
- Humans MeSH
- Mice MeSH
- Semantics * MeSH
- Cluster Analysis MeSH
- Ubiquitin genetics metabolism MeSH
- Ubiquitin-Protein Ligases * genetics MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Mice MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
The availability of a great range of prior biological knowledge about the roles and functions of genes and gene-gene interactions allows us to simplify the analysis of gene expression data to make it more robust, compact, and interpretable. Here, we objectively analyze the applicability of functional clustering for the identification of groups of functionally related genes. The analysis is performed in terms of gene expression classification and uses predictive accuracy as an unbiased performance measure. Features of biological samples that originally corresponded to genes are replaced by features that correspond to the centroids of the gene clusters and are then used for classifier learning. Using 10 benchmark data sets, we demonstrate that functional clustering significantly outperforms random clustering without biological relevance. We also show that functional clustering performs comparably to gene expression clustering, which groups genes according to the similarity of their expression profiles. Finally, the suitability of functional clustering as a feature extraction technique is evaluated and discussed.
To understand how genes are distributed on chromosomes we bring new insights into gene positional clustering and its properties. We have made a large-scale analysis of three types of differentiation and we observed that genes that subsequently enter into different cell processes are positionally clustered on chromosomes. Genes from the clusters are transcribed subsequently with respect to time kinetics and also to position. This means that the genes related to a cellular process are clustered together, independent of the period of time during which they are active and important for the process. Our results also demonstrate not only that there are general regions of increased or decreased levels of gene expression, but also that, in fact, in some chromosome regions we can find clustering of genes related to specific cell processes. The results provided in this paper also support the theory of "transcription factories" and show that transcription of genes from the clusters is managed by softer epigenetic mechanisms.
- MeSH
- Cell Differentiation genetics MeSH
- K562 Cells MeSH
- DNA Primers genetics MeSH
- Financing, Organized MeSH
- Transcription, Genetic MeSH
- Granulocytes cytology metabolism MeSH
- HL-60 Cells MeSH
- Humans MeSH
- Megakaryocytes cytology metabolism MeSH
- Monocytes cytology metabolism MeSH
- Multigene Family MeSH
- Myelopoiesis genetics MeSH
- Base Sequence MeSH
- Oligonucleotide Array Sequence Analysis MeSH
- Gene Expression Profiling MeSH
- Thrombopoiesis genetics MeSH
- Check Tag
- Humans MeSH
On the basis of analyse results of athletic decathlion we want to show more substantial statistical objectivity factor analysis in comparsion with cluster analysis. Factor analysis presets even after the rotation the same results. Bath actions is necessary completes the characterizations mather-of-fact signification.
- MeSH
- Running physiology statistics & numerical data MeSH
- Factor Analysis, Statistical MeSH
- Kinesiology, Applied methods statistics & numerical data utilization MeSH
- Track and Field physiology statistics & numerical data MeSH
- Humans MeSH
- Men MeSH
- Cluster Analysis MeSH
- Efficiency physiology MeSH
- Check Tag
- Humans MeSH
Analysis of population genetic structure has become a standard approach in population genetics. In polyploid complexes, clustering analyses can elucidate the origin of polyploid populations and patterns of admixture between different cytotypes. However, combining diploid and polyploid data can theoretically lead to biased inference with (artefactual) clustering by ploidy. We used simulated mixed-ploidy (diploid-autotetraploid) data to systematically compare the performance of k-means clustering and the model-based clustering methods implemented in STRUCTURE, ADMIXTURE, FASTSTRUCTURE and INSTRUCT under different scenarios of differentiation and with different marker types. Under scenarios of strong population differentiation, the tested applications performed equally well. However, when population differentiation was weak, STRUCTURE was the only method that allowed unbiased inference with markers with limited genotypic information (co-dominant markers with unknown dosage or dominant markers). Still, since STRUCTURE was comparatively slow, the much faster but less powerful FASTSTRUCTURE provides a reasonable alternative for large datasets. Finally, although bias makes k-means clustering unsuitable for markers with incomplete genotype information, for large numbers of loci (>1000) with known dosage k-means clustering was superior to FASTSTRUCTURE in terms of power and speed. We conclude that STRUCTURE is the most robust method for the analysis of genetic structure in mixed-ploidy populations, although alternative methods should be considered under some specific conditions.
- MeSH
- Diploidy MeSH
- Genetic Variation genetics MeSH
- Genetic Markers genetics MeSH
- Genotype MeSH
- Polymorphism, Single Nucleotide genetics MeSH
- Microsatellite Repeats genetics MeSH
- Ploidies * MeSH
- Genetics, Population statistics & numerical data MeSH
- Cluster Analysis MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.