BACKGROUND: Antineutrophil cytoplasmic antibody (ANCA)-associated vasculitis is a heterogenous autoimmune disease. While traditionally stratified into two conditions, granulomatosis with polyangiitis (GPA) and microscopic polyangiitis (MPA), the subclassification of ANCA-associated vasculitis is subject to continued debate. Here we aim to identify phenotypically distinct subgroups and develop a data-driven subclassification of ANCA-associated vasculitis, using a large real-world dataset. METHODS: In the collaborative data reuse project FAIRVASC (Findable, Accessible, Interoperable, Reusable, Vasculitis), registry records of patients with ANCA-associated vasculitis were retrieved from six European vasculitis registries: the Czech Registry of ANCA-associated vasculitis (Czech Republic), the French Vasculitis Study Group Registry (FVSG; France), the Joint Vasculitis Registry in German-speaking Countries (GeVas; Germany), the Polish Vasculitis Registry (POLVAS; Poland), the Irish Rare Kidney Disease Registry (RKD; Ireland), and the Skåne Vasculitis Cohort (Sweden). We performed model-based clustering of 17 mixed-type clinical variables using a parsimonious mixture of two latent Gaussian variable models. Clinical validation of the optimal cluster solution was made through summary statistics of the clusters' demography, phenotypic and serological characteristics, and outcome. The predictive value of models featuring the cluster affiliations were compared with classifications based on clinical diagnosis and ANCA specificity. People with lived experience were involved throughout the FAIRVASVC project. FINDINGS: A total of 3868 patients diagnosed with ANCA-associated vasculitis between Nov 1, 1966, and March 1, 2023, were included in the study across the six registries (Czech Registry n=371, FVSG n=1780, GeVas n=135, POLVAS n=792, RKD n=439, and Skåne Vasculitis Cohort n=351). There were 2434 (62·9%) patients with GPA and 1434 (37·1%) with MPA. Mean age at diagnosis was 57·2 years (SD 16·4); 2006 (51·9%) of 3867 patients were men and 1861 (48·1%) were women. We identified five clusters, with distinct phenotype, biochemical presentation, and disease outcome. Three clusters were characterised by kidney involvement: one severe kidney cluster (555 [14·3%] of 3868 patients) with high C-reactive protein (CRP) and serum creatinine concentrations, and variable ANCA specificity (SK cluster); one myeloperoxidase (MPO)-ANCA-positive kidney involvement cluster (782 [20·2%]) with limited extrarenal disease (MPO-K cluster); and one proteinase 3 (PR3)-ANCA-positive kidney involvement cluster (683 [17·7%]) with widespread extrarenal disease (PR3-K cluster). Two clusters were characterised by relative absence of kidney involvement: one was a predominantly PR3-ANCA-positive cluster (1202 [31·1%]) with inflammatory multisystem disease (IMS cluster), and one was a cluster (646 [16·7%]) with predominantly ear-nose-throat involvement and low CRP, with mainly younger patients (YR cluster). Compared with models fitted with clinical diagnosis or ANCA status, cluster-assigned models demonstrated improved predictive power with respect to both patient and kidney survival. INTERPRETATION: Our study reinforces the view that ANCA-associated vasculitis is not merely a binary construct. Data-driven subclassification of ANCA-associated vasculitis exhibits higher predictive value than current approaches for key outcomes. FUNDING: European Union's Horizon 2020 research and innovation programme under the European Joint Programme on Rare Diseases.
- MeSH
- Anti-Neutrophil Cytoplasmic Antibody-Associated Vasculitis * classification diagnosis epidemiology blood immunology MeSH
- Adult MeSH
- Cohort Studies MeSH
- Middle Aged MeSH
- Humans MeSH
- Microscopic Polyangiitis classification epidemiology blood diagnosis immunology MeSH
- Registries * statistics & numerical data MeSH
- Aged MeSH
- Cluster Analysis MeSH
- Check Tag
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Geographicals
- Europe MeSH
The availability of a great range of prior biological knowledge about the roles and functions of genes and gene-gene interactions allows us to simplify the analysis of gene expression data to make it more robust, compact, and interpretable. Here, we objectively analyze the applicability of functional clustering for the identification of groups of functionally related genes. The analysis is performed in terms of gene expression classification and uses predictive accuracy as an unbiased performance measure. Features of biological samples that originally corresponded to genes are replaced by features that correspond to the centroids of the gene clusters and are then used for classifier learning. Using 10 benchmark data sets, we demonstrate that functional clustering significantly outperforms random clustering without biological relevance. We also show that functional clustering performs comparably to gene expression clustering, which groups genes according to the similarity of their expression profiles. Finally, the suitability of functional clustering as a feature extraction technique is evaluated and discussed.
BACKGROUND AND OBJECTIVES: Recent studies fueled doubts as to whether all currently defined central disorders of hypersomnolence are stable entities, especially narcolepsy type 2 and idiopathic hypersomnia. New reliable biomarkers are needed, and the question arises of whether current diagnostic criteria of hypersomnolence disorders should be reassessed. The main aim of this data-driven observational study was to see whether data-driven algorithms would segregate narcolepsy type 1 and identify more reliable subgrouping of individuals without cataplexy with new clinical biomarkers. METHODS: We used agglomerative hierarchical clustering, an unsupervised machine learning algorithm, to identify distinct hypersomnolence clusters in the large-scale European Narcolepsy Network database. We included 97 variables, covering all aspects of central hypersomnolence disorders such as symptoms, demographics, objective and subjective sleep measures, and laboratory biomarkers. We specifically focused on subgrouping of patients without cataplexy. The number of clusters was chosen to be the minimal number for which patients without cataplexy were put in distinct groups. RESULTS: We included 1,078 unmedicated adolescents and adults. Seven clusters were identified, of which 4 clusters included predominantly individuals with cataplexy. The 2 most distinct clusters consisted of 158 and 157 patients, were dominated by those without cataplexy, and among other variables, significantly differed in presence of sleep drunkenness, subjective difficulty awakening, and weekend-week sleep length difference. Patients formally diagnosed as having narcolepsy type 2 and idiopathic hypersomnia were evenly mixed in these 2 clusters. DISCUSSION: Using a data-driven approach in the largest study on central disorders of hypersomnolence to date, our study identified distinct patient subgroups within the central disorders of hypersomnolence population. Our results contest inclusion of sleep-onset REM periods in diagnostic criteria for people without cataplexy and provide promising new variables for reliable diagnostic categories that better resemble different patient phenotypes. Cluster-guided classification will result in a more solid hypersomnolence classification system that is less vulnerable to instability of single features.
- MeSH
- Idiopathic Hypersomnia * diagnosis MeSH
- Cataplexy * diagnosis MeSH
- Humans MeSH
- Adolescent MeSH
- Narcolepsy * diagnosis drug therapy MeSH
- Disorders of Excessive Somnolence * diagnosis epidemiology MeSH
- Cluster Analysis MeSH
- Check Tag
- Humans MeSH
- Adolescent MeSH
- Publication type
- Journal Article MeSH
- Observational Study MeSH
BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.
Autori sa v príspevku zaoberajú problematikou klasifikácie skupín atletických disciplín ovplyvňuj-úcich športovú výkonnosť sedemboja žien. Na identifikáciu skupín boli využité ukazovatele najlepšíchsvetových výkonov sedemboja nad 6200 bodov podľa dostupných údajov z IAAF (N = 172). Z klasifi-kačných metód zhlukovania boli použité hierarchické modely ako Average linkage (Between & Within- group), Single Linkage - Nearest neigbor, Complete Linkage - Farthest neigbor, Centroid linkage,Median clustering, Ward ́s method.Všetkých sedem zhlukových metód sa zhodlo v dvoch skupinách zhlukov a v obsahu disciplínv 2. klastry [200 m, skok od diaľky, 800 m, 100 m prekážok, skok do výšky] [vrh guľou, hod oštepom].Test stability so štruktúrou zhlukov sedemboja na úrovni 2. klastra je 100 %. Najvyššiu stabilitu42,86 % javí vnútorná hierarchia disciplín [200 m, Skok do diaľky, 100 m prekážok, Skok do výšky,800 m] [Vrh guľou, Hod oštepom].Hierarchické modely umožnili identifikovať skupiny atletických disciplíny ovplyvňujúce športovývýkon v sedemboji žien. Poznanie štruktúry športového výkonu prispieva k zefektívneniu tréningovéhoprocesu a určeniu viacbojárskej typológie pretekárok svetovej výkonnosti.
Authors deals with the problematics of group classification of athletics disciplines, which influence thesports performance in the women's heptathlon. For the group identification, the indicators of the bestworld's performance in heptathlon above the 6200 points according to the data from IAAF (N = 172)were used. From the classification methods of clustering the hierarchical models as the Average linkage(Between & Within-group), Single Linkage - Nearest neighbor, Complete Linkage - Farthest neighbor,Centroid linkage, Median clustering, and Ward ́s method were used.All seven clustering methods agreed in two groups of clusters and in the content of disciplines in2 clusters [200 meters, Long jump, 800 meters, 100 meters hurdles, High jump] [Shot put, Javelinthrow]. The stability test with the cluster structure of heptathlon in the level of the second cluster is100 %. The highest stability, 42,86 %, shows the internal hierarchy of disciplines [200 meters, Longjump, 100 meters hurdles, High jump, 800 meters] [Shot put, Javelin throw].Hierarchical models allow identifying groups of athletics disciplines that influence the sports perfor-mance in women's heptathlon. Understanding the structure of sports performance contributes to thestreamlining the training process and determining the combined events typology of world performanceathletes.
- MeSH
- Classification MeSH
- Track and Field classification MeSH
- Humans MeSH
- Sports classification MeSH
- Check Tag
- Humans MeSH
- Female MeSH
- Publication type
- Comparative Study MeSH
Myeloid-derived suppressor cells (MDSCs) are important regulators of immune processes during sepsis in mice. However, confirming these observations in humans has been challenging due to the lack of defined preparation protocols and phenotyping schemes for MDSC subsets. Thus, it remains unclear how MDSCs are involved in acute sepsis and whether they have a role in the long-term complications seen in survivors. Here, we combined comprehensive flow cytometry phenotyping with unsupervised clustering using self-organizing maps to identify the three recently defined human MDSC subsets in blood from severe sepsis patients, long-term sepsis survivors, and age-matched controls. We demonstrated the expansion of monocytic M-MDSCs and polymorphonuclear PMN-MDSCs, but not early-stage (e)-MDSCs during acute sepsis. High levels of PMN-MDSCs were also present in long-term survivors many months after discharge, suggesting a possible role in sepsis-related complications. Altogether, by employing unsupervised clustering of flow cytometric data we have confirmed the likely involvement of human MDSC subsets in acute sepsis, and revealed their expansion in sepsis survivors at late time points. The application of this strategy in future studies and in the clinical/diagnostic context would enable rapid progress toward a full understanding of the roles of MDSC in sepsis and other inflammatory conditions.
- MeSH
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Monocytes immunology MeSH
- Myeloid-Derived Suppressor Cells immunology MeSH
- Flow Cytometry methods MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Sepsis immunology MeSH
- Cluster Analysis MeSH
- Inflammation immunology MeSH
- Check Tag
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Markov Random Walks (MRW) has proven to be an effective way to understand spectral clustering and embedding. However, due to less global structural measure, conventional MRW (e.g., the Gaussian kernel MRW) cannot be applied to handle data points drawn from a mixture of subspaces. In this paper, we introduce a regularized MRW learning model, using a low-rank penalty to constrain the global subspace structure, for subspace clustering and estimation. In our framework, both the local pairwise similarity and the global subspace structure can be learnt from the transition probabilities of MRW. We prove that under some suitable conditions, our proposed local/global criteria can exactly capture the multiple subspace structure and learn a low-dimensional embedding for the data, in which giving the true segmentation of subspaces. To improve robustness in real situations, we also propose an extension of the MRW learning model based on integrating transition matrix learning and error correction in a unified framework. Experimental results on both synthetic data and real applications demonstrate that our proposed MRW learning model and its robust extension outperform the state-of-the-art subspace clustering methods.
- MeSH
- Algorithms MeSH
- Emotions physiology MeSH
- Humans MeSH
- Limbic System physiology MeSH
- Models, Neurological MeSH
- Neural Networks, Computer * MeSH
- Pattern Recognition, Automated methods MeSH
- Cluster Analysis MeSH
- Models, Theoretical MeSH
- Learning MeSH
- Artificial Intelligence MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Review MeSH
IARC scientific publications ; no. 135
17, 247 s.
- Keywords
- epidemiologie, klastrová analýza, klastry, nemoci, životní prostředí,
- Conspectus
- Veřejné zdraví a hygiena
- NML Fields
- epidemiologie
- environmentální vědy