clustering analysis Dotaz Zobrazit nápovědu
Policymakers and officials worldwide are making more stringent environmental norms and waste disposal policies to encourage industries to move towards cleaner production. One of the main challenges that industries face moving towards cleaner production is the adoption of different strategies for optimising their resource utilisation and waste reduction economically. This is particularly challenging for large-scale industries or a group of industrial plants located in an industrial region. This paper presents a novel approach to economic resource optimisation focussed mainly on large-scale industries, different industrial plants located in the vicinity of each other, or an industrial symbiosis network. In this work, a clustering algorithm is developed to segregate the given plants into different clusters based on the concept of load deficits and surpluses of each plant. The concept ideally allows only the plants with surpluses to send out their unused sources and plants with deficits to only receive external sources/resources. The clusters are formed based on the distances between plants, which in turn helps in saving transportation and communication costs. The clustered plants are then easy to optimise and manage for resource and cost optimality. The applicability of the proposed clustering algorithm is demonstrated using two case studies from the domain of water recycling networks containing multiple contaminants with detailed network design, highlighting the importance of clustering in an industrial symbiosis network. It is observed that directing the excess flows from one plant to other plants in the same cluster can save a considerable amount of fresh resources. It implies that in the broader aspect, the developed methodology can address the optimisation of economic resources and can aid in the better management of overall resources for a large-scale industrial symbiosis network.
- Klíčová slova
- Clustering analysis, Industrial symbiosis, Process integration, Resource conservation network, Segregated targeting,
- MeSH
- alokace zdrojů MeSH
- průmysl * MeSH
- shluková analýza MeSH
- voda MeSH
- zachování přírodních zdrojů * metody MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- voda MeSH
BACKGROUND: Ubiquitin ligases (Ub-ligases) are essential intracellular enzymes responsible for the regulation of proteome homeostasis, signaling pathway crosstalk, cell differentiation and stress responses. Individual Ub-ligases exhibit their unique functions based on the nature of their substrates. They create a complex regulatory network with alternative and feedback pathways to maintain cell homeostasis, being thus important players in many physiological and pathological conditions. However, the functional classification of Ub-ligases needs to be revised and extended. METHODS: In the current study, we used a novel semantic biclustering technique for expression profiling of Ub-ligases and ubiquitination-related genes in the murine gastrointestinal tract (GIT). We accommodated a general framework of the algorithm for finding tissue-specific gene expression clusters in GIT. In order to test identified clusters in a biological system, we used a model of epithelial regeneration. For this purpose, a dextran sulfate sodium (DSS) mouse model, following with in situ hybridization, was used to expose genes with possible compensatory features. To determine cell-type specific distribution of Ub-ligases and ubiquitination-related genes, principal component analysis (PCA) and Uniform Manifold Approximation and Projection technique (UMAP) were used to analyze the Tabula Muris scRNA-seq data of murine colon followed by comparison with our clustering results. RESULTS: Our established clustering protocol, that incorporates the semantic biclustering algorithm, demonstrated the potential to reveal interesting expression patterns. In this manner, we statistically defined gene clusters consisting of the same genes involved in distinct regulatory pathways vs distinct genes playing roles in functionally similar signaling pathways. This allowed us to uncover the potentially redundant features of GIT-specific Ub-ligases and ubiquitination-related genes. Testing the statistically obtained results on the mouse model showed that genes clustered to the same ontology group simultaneously alter their expression pattern after induced epithelial damage, illustrating their complementary role during tissue regeneration. CONCLUSIONS: An optimized semantic clustering protocol demonstrates the potential to reveal a readable and unique pattern in the expression profiling of GIT-specific Ub-ligases, exposing ontologically relevant gene clusters with potentially redundant features. This extends our knowledge of ontological relationships among Ub-ligases and ubiquitination-related genes, providing an alternative and more functional gene classification. In a similar way, semantic cluster analysis could be used for studding of other enzyme families, tissues and systems.
- Klíčová slova
- Cluster analysis, GIT, Gene redundancy, Regeneration, Semantic biclustering, Ub-ligase,
- MeSH
- gastrointestinální trakt metabolismus MeSH
- lidé MeSH
- myši MeSH
- sémantika * MeSH
- shluková analýza MeSH
- ubikvitin genetika metabolismus MeSH
- ubikvitinligasy * genetika MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- myši MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- ubikvitin MeSH
- ubikvitinligasy * MeSH
The SHR is the most widely studied animal model of hypertension. In this strain, as in many humans with essential hypertension, increased blood pressure has been reported to cluster with other risk factors for cardiovascular disease, including insulin resistance and dyslipidemia. However, the genetic mechanisms that mediate this clustering of risk factors for cardiovascular disease or the hypertension "metabolic syndrome" remain poorly understood. In the current studies, we have demonstrated (1) that a gene or genes responsible for a whole spectrum of cardiovascular risk factors mapped to a limited segment of the centromeric region of rat chromosome 4, (2) that a spontaneous deletion in the gene for Cd36 that encodes a fatty acid transporter and is located directly at the peak of QTL linkages on chromosome 4 has been indirectly linked to the transmission of insulin resistance, defective fatty acid metabolism, and increased blood pressure, and (3) based on complementation analysis in two transgenic lines expressing wild-type Cd36 on the genetic background of the SHR strain harboring the deletion variant of Cd36, we have established that defective Cd36 can be a determinant of disordered fatty acid metabolism, glucose intolerance, and insulin resistance in spontaneous hypertension.
- MeSH
- antigeny CD36 genetika fyziologie MeSH
- delece genu MeSH
- dietní sacharidy farmakokinetika MeSH
- dietní tuky farmakokinetika MeSH
- genetická vazba MeSH
- geneticky modifikovaná zvířata MeSH
- hyperlipidemie epidemiologie genetika MeSH
- hypertenze epidemiologie genetika MeSH
- inzulinová rezistence genetika MeSH
- komplementární DNA genetika MeSH
- krevní tlak genetika MeSH
- krysa rodu Rattus MeSH
- kvantitativní znak dědičný MeSH
- ledviny patofyziologie MeSH
- lidé MeSH
- lipolýza genetika MeSH
- mapování chromozomů MeSH
- mastné kyseliny metabolismus MeSH
- modely nemocí na zvířatech MeSH
- mutace MeSH
- myši knockoutované MeSH
- myši MeSH
- potkani inbrední SHR genetika MeSH
- rizikové faktory MeSH
- sekvenční analýza hybridizací s uspořádaným souborem oligonukleotidů MeSH
- sekvenční delece MeSH
- testy genetické komplementace MeSH
- translokace genetická genetika MeSH
- zvířata kongenní MeSH
- zvířata MeSH
- Check Tag
- krysa rodu Rattus MeSH
- lidé MeSH
- myši MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- přehledy MeSH
- Názvy látek
- antigeny CD36 MeSH
- dietní sacharidy MeSH
- dietní tuky MeSH
- komplementární DNA MeSH
- mastné kyseliny MeSH
Inter-simple sequence repeat (ISSR)-polymerase chain reaction (PCR) polymorphism was generated to provide useful markers for assessment of genetic diversity within flax germplasm collections. We used nine previously selected anchored ISSR primers for fingerprinting of 53 flax cultivars or genotypes and obtained 62 scorable bands, from which 45 bands (72.6%) were polymorphic. An efficient separation of 53 flax accessions into four groups and eight subgroups was achieved using unweighted pair group method with arithmetic means (UPGMA) clustering procedure based on genetic similarity expressed by the Jaccard similarity coefficient (JSC). Clustering procedure within both groups and subgroups successfully produced smaller homogenous clusters, whereas clustering between the main four groups of flax accessions displayed only a continuous decrease of similarity with a weak clustering effect. Statistical significance of grouping and subgrouping within a cluster dendrogram was estimated by calculation of the error flag and cophenetic correlation parameter for each branch. Principal coordinates (PCO) analysis mostly confirmed the separation by UPGMA clustering. We observed a statistically significant correlation between the number of total vs polymorphic bands in ISSR patterns. A one-way analysis of variance (ANOVA) test confirmed statistically significant differences in the average thousand seed mass (TSM) between eight subclusters of flax accessions from an ISSR-PCR-based UPGMA dendrogram, which indicate statistical correlation between flax ISSR polymorphism (the structure of ISSR-based clustering) TSM.
- MeSH
- DNA fingerprinting metody MeSH
- fylogeneze MeSH
- genetická variace genetika MeSH
- genetické markery genetika MeSH
- len genetika MeSH
- polymorfismus genetický genetika MeSH
- repetitivní sekvence nukleových kyselin MeSH
- semena rostlinná genetika MeSH
- shluková analýza MeSH
- technika náhodné amplifikace polymorfní DNA metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- genetické markery MeSH
Markov Random Walks (MRW) has proven to be an effective way to understand spectral clustering and embedding. However, due to less global structural measure, conventional MRW (e.g., the Gaussian kernel MRW) cannot be applied to handle data points drawn from a mixture of subspaces. In this paper, we introduce a regularized MRW learning model, using a low-rank penalty to constrain the global subspace structure, for subspace clustering and estimation. In our framework, both the local pairwise similarity and the global subspace structure can be learnt from the transition probabilities of MRW. We prove that under some suitable conditions, our proposed local/global criteria can exactly capture the multiple subspace structure and learn a low-dimensional embedding for the data, in which giving the true segmentation of subspaces. To improve robustness in real situations, we also propose an extension of the MRW learning model based on integrating transition matrix learning and error correction in a unified framework. Experimental results on both synthetic data and real applications demonstrate that our proposed MRW learning model and its robust extension outperform the state-of-the-art subspace clustering methods.
- Klíčová slova
- Dimensionality reduction, Markov random walks, Spectral clustering, Subspace clustering and estimation, Transition probability learning,
- MeSH
- algoritmy MeSH
- emoce fyziologie MeSH
- lidé MeSH
- limbický systém fyziologie MeSH
- modely neurologické MeSH
- neuronové sítě (počítačové) * MeSH
- rozpoznávání automatizované metody MeSH
- shluková analýza MeSH
- teoretické modely MeSH
- učení MeSH
- umělá inteligence MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- přehledy MeSH
BACKGROUND: The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena. RESULTS: We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study. CONCLUSIONS: GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.
- Klíčová slova
- Julia, clustering, dimensionality reduction, high-performance computing, self-organizing maps, single-cell cytometry,
- MeSH
- algoritmy * MeSH
- myši MeSH
- programovací jazyk * MeSH
- shluková analýza MeSH
- software MeSH
- zvířata MeSH
- Check Tag
- myši MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The availability of a great range of prior biological knowledge about the roles and functions of genes and gene-gene interactions allows us to simplify the analysis of gene expression data to make it more robust, compact, and interpretable. Here, we objectively analyze the applicability of functional clustering for the identification of groups of functionally related genes. The analysis is performed in terms of gene expression classification and uses predictive accuracy as an unbiased performance measure. Features of biological samples that originally corresponded to genes are replaced by features that correspond to the centroids of the gene clusters and are then used for classifier learning. Using 10 benchmark data sets, we demonstrate that functional clustering significantly outperforms random clustering without biological relevance. We also show that functional clustering performs comparably to gene expression clustering, which groups genes according to the similarity of their expression profiles. Finally, the suitability of functional clustering as a feature extraction technique is evaluated and discussed.
To understand how genes are distributed on chromosomes we bring new insights into gene positional clustering and its properties. We have made a large-scale analysis of three types of differentiation and we observed that genes that subsequently enter into different cell processes are positionally clustered on chromosomes. Genes from the clusters are transcribed subsequently with respect to time kinetics and also to position. This means that the genes related to a cellular process are clustered together, independent of the period of time during which they are active and important for the process. Our results also demonstrate not only that there are general regions of increased or decreased levels of gene expression, but also that, in fact, in some chromosome regions we can find clustering of genes related to specific cell processes. The results provided in this paper also support the theory of "transcription factories" and show that transcription of genes from the clusters is managed by softer epigenetic mechanisms.
- MeSH
- buněčná diferenciace genetika MeSH
- buňky K562 MeSH
- DNA primery genetika MeSH
- genetická transkripce MeSH
- granulocyty cytologie metabolismus MeSH
- HL-60 buňky MeSH
- lidé MeSH
- megakaryocyty cytologie metabolismus MeSH
- monocyty cytologie metabolismus MeSH
- multigenová rodina * MeSH
- myelopoéza genetika MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza hybridizací s uspořádaným souborem oligonukleotidů MeSH
- stanovení celkové genové exprese MeSH
- trombopoéza genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- DNA primery MeSH
The interpretation of two-dimensional gel electrophoresis spot profiles can be facilitated by statistical and machine learning programs. Two different approaches to classification of spot profiles - cluster analysis and neural networks - are discussed. Neural networks for two different model patterns were designed and an algorithm for training of the net for the classification was developed. It was shown that the performance of neural networks is higher compared to cluster and principal component analysis. The possibility of combining both approaches into one process can increase reliability and speed of classification. Artificially created training sets with added random noise can be used for network training. The analysis was applied on the Streptomyces coelicolor developmental two-dimensional (2-D) gel database.
BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy-Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used.
- Klíčová slova
- Breakpoints uncertainty problem, Constrained clustering, Mendelian inheritance error, Structural variants, Whole genome sequencing,
- MeSH
- genom lidský * MeSH
- genomika MeSH
- lidé MeSH
- nejistota MeSH
- shluková analýza MeSH
- strukturální variace genomu * MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH