cluster algorithm
Dotaz
Zobrazit nápovědu
Lots of brain diseases are recognized by EEG recording. EEG signal has a stochastic character, this stochastic nature makes the evaluation of EEG recording complicated. Therefore we use automatic classification methods for EEG processing. This methods help the expert to find significant or physiologically important segments in the EEG recording. The k-means algorithm is a frequently used method in practice for automatic classification. The main disadvantage of the k-means algorithm is the necessary determination of the number of clusters. So far there are many methods which try to determine optimal number of clusters for k-means algorithm. The aim of this study is to test functionality of the two most frequently used methods on EEG signals, concretely the elbow and the silhouette method. In this feasibility study we compared the results of both methods on simulated data and real EEG signal. We want to prove with the help of an expert the possibility to use these functions on real EEG signal. The results show that the silhouette method applied on EEG recordings is more time-consuming than the elbow method. Neither of the methods is able to correctly recognize the number of clusters in the EEG record by expert evaluation and therefore it is not applicable to the automatic classification of EEG based on k-means algorithm.
- MeSH
- algoritmy MeSH
- elektroencefalografie * metody MeSH
- lidé MeSH
- počítačová simulace MeSH
- počítačové zpracování signálu MeSH
- shluková analýza MeSH
- výzkum MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- práce podpořená grantem MeSH
OBJECTIVE: We pursued the hypothesis that complex regional pain syndrome (CRPS) signs observed by neurologic examination display a structure allowing for alignment of patients to particular phenotype clusters. METHODS: Clinical examination data were obtained from 3 independent samples of 444, 391, and 202 patients with CRPS. The structure among CRPS signs was analyzed in sample 1 and validated with sample 2 using hierarchical clustering. For patients with CRPS in sample 3, an individual phenotype score was submitted to k-means clustering. Pain characteristics, quantitative sensory testing, and psychological data were tested in this sample as descriptors for phenotypes. RESULTS: A 2-cluster structure emerged in sample 1 and was replicated in sample 2. Cluster 1 comprised minor injury eliciting CRPS, motor signs, allodynia, and glove/stocking-like sensory deficits, resembling a CRPS phenotype most likely reflecting a CNS pathophysiology (the central phenotype). Cluster 2, which consisted of edema, skin color changes, skin temperature changes, sweating, and trophic changes, probably represents peripheral inflammation, the peripheral phenotype. In sample 3, individual phenotype scores were calculated as the sum of the mean values of signs from each cluster, where signs from cluster 1 were coded with 1 and from cluster 2 with -1. A k-means algorithm separated groups with 78, 36, and 88 members resembling the peripheral, central, and mixed phenotypes, respectively. The central phenotype was characterized by cold hyperalgesia at the affected limb. CONCLUSIONS: Statistically determined CRPS phenotypes may reflect major pathophysiologic mechanisms of peripheral inflammation and central reorganization.
Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
MOTIVATION: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins' surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. RESULTS: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. AVAILABILITY AND IMPLEMENTATION: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. CONTACT: muller@mou.cz SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
- MeSH
- algoritmy * MeSH
- databáze proteinů * MeSH
- epitopy chemie MeSH
- interakční proteinové domény a motivy * MeSH
- lidé MeSH
- Markovovy řetězce MeSH
- molekulární sekvence - údaje MeSH
- monoklonální protilátky chemie MeSH
- peptidy chemie MeSH
- sekvence aminokyselin MeSH
- sekvenční seřazení MeSH
- shluková analýza MeSH
- software MeSH
- src homologní domény MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
We present a new algorithm to analyse information content in images acquired using automated fluorescence microscopy. The algorithm belongs to the group of autofocusing methods, but differs from its predecessors in that it can handle thick specimens and operate also in confocal mode. It measures the information content in images using a 'content function', which is essentially the same concept as a focus function. Unlike previously presented algorithms, this algorithm tries to find all significant axial positions in cases where the content function applied to real data is not unimodal, which is often the case. This requirement precludes using algorithms that rely on unimodality. Moreover, choosing a content function requires careful consideration, because some functions suppress local maxima. First, we test 19 content functions and evaluate their ability to show local maxima clearly. The results show that only six content functions succeed. To save time, the acquisition procedure needs to vary the step size adaptively, because a wide range of possible axial positions has to be passed so as not to miss a local maximum. The algorithm therefore has to assess the steepness of the content function online so that it can decide to use a bigger or smaller step size to acquire the next image. Therefore, the algorithm needs to know about typical behaviour of content functions. We show that for normalized variance, one of the most promising content functions, this knowledge can be obtained after normalizing with respect to the theoretical maximum of this function, and using hierarchical clustering. The resulting algorithm is more reliable and efficient than a simple procedure with constant steps.
- MeSH
- algoritmy MeSH
- automatizované zpracování dat metody trendy využití MeSH
- financování organizované MeSH
- kalcinóza diagnóza klasifikace MeSH
- lidé MeSH
- mamografie metody přístrojové vybavení využití MeSH
- nemoci prsů diagnóza klasifikace MeSH
- neuronové sítě MeSH
- teoretické modely MeSH
- vylepšení obrazu metody MeSH
- Check Tag
- lidé MeSH
BACKGROUND: Identification of coordinately regulated genes according to the level of their expression during the time course of a process allows for discovering functional relationships among genes involved in the process. RESULTS: We present a single class classification method for the identification of genes of similar function from a gene expression time series. It is based on a parallel genetic algorithm which is a supervised computer learning method exploiting prior knowledge of gene function to identify unknown genes of similar function from expression data. The algorithm was tested with a set of randomly generated patterns; the results were compared with seven other classification algorithms including support vector machines. The algorithm avoids several problems associated with unsupervised clustering methods, and it shows better performance then the other algorithms. The algorithm was applied to the identification of secondary metabolite gene clusters of the antibiotic-producing eubacterium Streptomyces coelicolor. The algorithm also identified pathways associated with transport of the secondary metabolites out of the cell. We used the method for the prediction of the functional role of particular ORFs based on the expression data. CONCLUSION: Through analysis of a time series of gene expression, the algorithm identifies pathways which are directly or indirectly associated with genes of interest, and which are active during the time course of the experiment.
Identification of risk factors for transient ischemic attack (TIA) is crucial for patients with atrial fibrillation (AF). However, identifying risk factors in young patients with low-risk AF is difficult, because the incidence of TIA in such patients is very low, which would result in traditional multiple logistic regression not being able to successfully identify the risk factors in such patients. Therefore, a novel algorithm for identifying risk factors for TIA is necessary. We thus propose a novel algorithm, which combines multiple correspondence analysis and hierarchical cluster analysis and uses the Taiwan National Health Insurance Research Database, a population-based database, to determine risk factors in these patients. The results of this study can help clinicians or patients with AF in preventing TIA or stroke events as early as possible.