The increasing prevalence of autism spectrum disorders (ASD) has led to worldwide interest in factors influencing the age of ASD diagnosis. Parents or caregivers of 237 ASD children (193 boys, 44 girls) diagnosed using the Autism Diagnostic Observation Schedule (ADOS) completed a simple descriptive questionnaire. The data were analyzed using the variable-centered multiple regression analysis and the person-centered classification tree method. We believed that the concurrent use of these two methods could produce robust results. The mean age at diagnosis was 5.8 ± 2.2 years (median 5.3 years). Younger ages for ASD diagnosis were predicted (using multiple regression analysis) by higher scores in the ADOS social domain, higher scores in ADOS restrictive and repetitive behaviors and interest domain, higher maternal education, and the shared household of parents. Using the classification tree method, the subgroup with the lowest mean age at diagnosis were children, in whom the summation of ADOS communication and social domain scores was ≥ 17, and paternal age at the delivery was ≥ 29 years. In contrast, the subgroup with the oldest mean age at diagnosis included children with summed ADOS communication and social domain scores < 17 and maternal education at the elementary school level. The severity of autism and maternal education played a significant role in both types of data analysis focused on age at diagnosis.
- Keywords
- ADOS, Age at diagnosis, Autism spectrum disorders, Maternal education, Paternal age, Shared household,
- MeSH
- Autistic Disorder * MeSH
- Child MeSH
- Adult MeSH
- Communication MeSH
- Humans MeSH
- Child Development Disorders, Pervasive * MeSH
- Autism Spectrum Disorder * diagnosis epidemiology MeSH
- Child, Preschool MeSH
- Regression Analysis MeSH
- Check Tag
- Child MeSH
- Adult MeSH
- Humans MeSH
- Male MeSH
- Child, Preschool MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
PURPOSE: The purposes of this study are to identify the strongest clinical parameters in relation to in-hospital mortality, which are available in the earliest phase of the hospitalization of patients, and to create an easy tool for the early identification of patients at risk. MATERIALS AND METHODS: The classification and regression tree analysis was applied to data from the Acute Heart Failure Database-Main registry comprising patients admitted to specialized cardiology centers with all syndromes of acute heart failure. The classification model was built on derivation cohort (n = 2543) and evaluated on validation cohort (n = 1387). RESULTS: The classification tree stratifies patients according to the presence of cardiogenic shock (CS), the level of creatinine, and the systolic blood pressure (SBP) at admission into the 5 risk groups with in-hospital mortality ranging from 2.8% to 66.2%. Patients without CS and creatinine level of 155 μmol/L or less were classified into very-low-risk group; patients without CS, creatinine level greater than 155 μmol/L, and SBP greater than 103 mm Hg, into low-risk group, whereas patients without CS, creatinine level greater than 155 μmol/L, and SBP of 103 mm Hg or lower, into intermediate-risk group. The high-risk group patients had CS and creatinine of 140 μmol/L or less; patients with CS and creatinine level greater than 140 μmol/L belong to very-high-risk group. The area under receiver operating characteristic curve was 0.823 and 0.832, and the value of Brier's score was estimated on level 0.091 and 0.084, for the derivation and the validation cohort, respectively. CONCLUSIONS: The presented classification model effectively stratified patients with all syndromes of acute heart failure into in-hospital mortality risk groups and might be of advantage for clinical practice.
- MeSH
- Risk Assessment methods MeSH
- Middle Aged MeSH
- Humans MeSH
- Hospital Mortality * MeSH
- Registries MeSH
- Risk Factors MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Heart Failure classification mortality MeSH
- Models, Statistical * MeSH
- Check Tag
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The increasing trend of adolescents' emotional symptoms has become a global public health problem. Especially, adolescents with chronic diseases or disabilities face more risks of emotional problems. Ample evidence showed family environment associates with adolescents' emotional health. However, the categories of family-related factors that most strongly influence adolescents' emotional health remained unclear. Additionally, it was not known that whether family environment influences emotional health differently between normally developed adolescents and those with chronic condition(s). Health Behaviours in School-aged Children (HBSC) database provides mass data about adolescents' self-reported health and social environmental backgrounds, which offers opportunities to apply data-driven approaches to determine critical family environmental factors that influence adolescents' health. Thus, based on the national HBSC data in the Czech Republic collected from 2017 to 2018, the current study adopted a data-driven method, classification-regression-decision-tree analysis, to investigate the impacts of family environmental factors, including demographic factors and psycho-social factors on adolescents' emotional health. The results suggested that family psycho-social functions played a significant role in maintaining adolescents' emotional health. Both normally developed adolescents and chronic-condition(s) adolescents benefited from communication with parents, family support, and parental monitoring. Besides, for adolescents with chronic condition(s), school-related parental support was also meaningful for decreasing emotional problems. In conclusion, the findings suggest the necessity of interventions to strengthen family-school communication and cooperation to improve chronic-disease adolescents' mental health. The interventions aiming to improve parent-adolescent communication, parental monitoring, and family support are essential for all adolescents.
- Keywords
- adolescent, chronic condition, decision tree, emotional health, family environment,
- MeSH
- Chronic Disease MeSH
- Child MeSH
- Mental Health * MeSH
- Emotions MeSH
- Humans MeSH
- Adolescent MeSH
- Parents * psychology MeSH
- Decision Trees MeSH
- Check Tag
- Child MeSH
- Humans MeSH
- Adolescent MeSH
- Publication type
- Journal Article MeSH
Quadrupole inductively coupled plasma mass spectrometry (Q-ICP-MS) and direct mercury analysis were used to determine the elemental composition of 180 transformed (salt-ripened) anchovies from three different fishing areas before and after packaging. To this purpose, four decision trees-based algorithms, corresponding to C5.0, classification and regression trees (CART), chi-squareautomatic interaction detection (CHAID), and quick unbiased efficient statistical tree (QUEST) were applied to the elemental datasets to find the most accurate data mining procedure to achieve the ultimate goal of fish origin prediction. Classification rules generated by the trained CHAID model optimally identified unlabelled testing bulk anchovies (93.9% F-score) by using just 6 out of 52 elements (As, K, P, Cd, Li, and Sr). The finished packaged product was better modelled by the QUEST algorithm which recognised the origin of anchovies with F-score of 97.7%, considering the information carried out by 5 elements (B, As, K. Cd, and Pd). Results obtained suggested that the traceability system in the fishery sector may be supported by simplified machine learning techniques applied to a limited but effective number of inorganic predictors of origin.
- Keywords
- Data mining, Decision trees, Engraulis encrasicolus, Fish products, Geographical origin, ICP-MS,
- MeSH
- Algorithms MeSH
- Decision Trees MeSH
- Mercury analysis MeSH
- Fish Products analysis MeSH
- Fishes MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Mercury MeSH
In this paper, we present the results of the research concerning extraction of informative gene expression profiles from high-dimensional array of gene expressions considering the state of patients' health using clustering method, ML-based binary classifiers and fuzzy inference system. Applying of the proposed stepwise procedure can allow us to extract the most informative genes taking into account both the subtypes of disease or state of the patient's health for further reconstruction of gene regulatory networks based on the allocated genes and following simulation of the reconstructed models. We used the publicly available gene expressions data as the experimental ones which were obtained using DNA microarray experiments and contained two types of patients' gene expression profiles-the patients with lung cancer tumor and healthy patients. The stepwise procedure of the data processing assumes the following steps-in the beginning, we reduce the number of genes by removing non-informative genes in terms of statistical criteria and Shannon entropy; then, we perform the stepwise hierarchical clustering of gene expression profiles at hierarchical levels from 1 to 10 using the SOTA (Self-Organizing Tree Algorithm) clustering algorithm with correlation distance metric. The quality of the obtained clustering was evaluated using the complex clustering quality criterion which is considered both the gene expression profiles distribution relative to center of the clusters where these gene expression profiles are allocated and the centers of the clusters distribution. The result of this stage execution was a selection of the optimal cluster at each of the hierarchical levels which corresponded to the minimum value of the quality criterion. At the next step, we have implemented a classification procedure of the examined objects using four well known binary classifiers-logistic regression, support-vector machine, decision trees and random forest classifier. The effectiveness of the appropriate technique was evaluated based on the use of ROC (Receiver Operating Characteristic) analysis using criteria, included as the components, the errors of both the first and the second kinds. The final decision concerning the extraction of the most informative subset of gene expression profiles was taken based on the use of the fuzzy inference system, the inputs of which are the results of the appropriate single classifiers operation and the output is the final solution concerning state of the patient's health. To our mind, the implementation of the proposed stepwise procedure of the informative gene expression profiles extraction create the conditions for the increasing effectiveness of the further procedure of gene regulatory networks reconstruction and the following simulation of the reconstructed models considering the subtypes of the disease and/or state of the patient's health.
- Keywords
- Brassicaceae, Classification & Regression Tree, Heliophila, Random Forests, climatic seasonality, drought regime, life history, phylogenetic tree,
- MeSH
- Brassicaceae * MeSH
- Climate Change MeSH
- Forests MeSH
- Droughts * MeSH
- Trees MeSH
- Publication type
- Letter MeSH
INTRODUCTION: The concept of phenotyping emerged, reflecting specific clinical, pulmonary and extrapulmonary features of each particular chronic obstructive pulmonary disease (COPD) case. Our aim was to analyze prognostic utility of: "Czech" COPD phenotypes and their most frequent combinations, "Spanish" phenotypes and Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages + groups in relation to long-term mortality risk. METHODS: Data were extracted from the Czech Multicenter Research Database (CMRD) of COPD. Kaplan-Meier (KM) estimates (at 60 months from inclusion) were used for mortality assessment. Survival rates were calculated for the six elementary "Czech" phenotypes and their most frequent and relevant combinations, "Spanish" phenotypes, GOLD grades and groups. Statistically significant differences were tested by Log Rank test. An analysis of factors underlying mortality risk (the role of confounders) has been assessed with the use of classification and regression tree (CART) analysis. Basic factors showing significant differences between deceased and living patients were entered into the CART model. This showed six different risk groups, the differences in risk were tested by a Log Rank test. RESULTS: The cohort (n=720) was 73.1% men, with a mean age of 66.6 years and mean FEV1 44.4% pred. KM estimates showed bronchiectases/COPD overlap (HR 1.425, p=0.045), frequent exacerbator (HR 1.58, p<0.001), cachexia (HR 2.262, p<0.001) and emphysematous (HR 1.786, p=0.015) phenotypes associated with higher mortality risk. Co-presence of multiple phenotypes in a single patient had additive effect on risk; combination of emphysema, cachexia and frequent exacerbations translated into poorest prognosis (HR 3.075; p<0.001). Of the "Spanish" phenotypes, AE CB and AE non-CB were associated with greater risk of mortality (HR 1.787 and 2.001; both p=0.001). FEV1% pred., cachexia and chronic heart failure in patient history were the major underlying factors determining mortality risk in our cohort. CONCLUSION: Certain phenotypes ("Czech" or "Spanish") of COPD are associated with higher risk of death. Co-presence of multiple phenotypes (emphysematous plus cachectic plus frequent exacerbator) in a single individual was associated with amplified risk of mortality.
- Keywords
- chronic obstructive pulmonary disease; COPD, classification and regression tree; CART, cluster, mortality, phenotypes,
- MeSH
- Bronchitis, Chronic * MeSH
- Pulmonary Disease, Chronic Obstructive * diagnosis MeSH
- Phenotype MeSH
- Humans MeSH
- Disease Progression MeSH
- Prospective Studies MeSH
- Aged MeSH
- Check Tag
- Humans MeSH
- Male MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Multicenter Study MeSH
- Geographicals
- Spain MeSH
Climate change is expected to intensify bark beetle population outbreaks in forests globally, affecting biodiversity and trajectories of change. Aspects of individual tree resistance remain poorly quantified, particularly with regard to the role of phenolic compounds, hindering robust predictions of forest response to future conditions. In 2003, we conducted a mechanical wounding experiment in a Norway spruce forest that coincided with an outbreak of the bark beetle, Ips typographus. We collected phloem samples from 97 trees and monitored tree survival for 5 months. Using high-performance liquid chromatography, we quantified induced changes in the concentrations of phenolics. Classification and regression tools were used to evaluate relationships between phenolic production and bark beetle resistance, in the context of other survival factors. The proximity of beetle source populations was a principal determinant of survival. Proxy measures of tree vigor, such as crown defoliation, mediated tree resistance. Controlling for these factors, synthesis of catechin was found to exponentially increase tree survival probability. However, even resistant trees were susceptible in late season due to high insect population growth. Our results show that incorporating trait-mediated effects improves predictions of survival. Using an integrated analytical approach, we demonstrate that phenolics play a direct role in tree defense to herbivory.
- Keywords
- Bark beetle outbreak, Catechin, Crown defoliation, Primary attraction, Resistance, Tree survival,
- MeSH
- Coleoptera * physiology MeSH
- Herbivory MeSH
- Phenols MeSH
- Phloem MeSH
- Picea * MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Phenols MeSH
BACKGROUND: Overcoming boundaries is crucial for incursion of alien plant species and their successful naturalization and invasion within protected areas. Previous work showed that in Kruger National Park, South Africa, this process can be quantified and that factors determining the incursion of invasive species can be identified and predicted confidently. Here we explore the similarity between determinants of incursions identified by the general model based on a multispecies assemblage, and those identified by species-specific models. We analyzed the presence and absence of six invasive plant species in 1.0×1.5 km segments along the border of the park as a function of environmental characteristics from outside and inside the KNP boundary, using two data-mining techniques: classification trees and random forests. PRINCIPAL FINDINGS: The occurrence of Ageratum houstonianum, Chromolaena odorata, Xanthium strumarium, Argemone ochroleuca, Opuntia stricta and Lantana camara can be reliably predicted based on landscape characteristics identified by the general multispecies model, namely water runoff from surrounding watersheds and road density in a 10 km radius. The presence of main rivers and species-specific combinations of vegetation types are reliable predictors from inside the park. CONCLUSIONS: The predictors from the outside and inside of the park are complementary, and are approximately equally reliable for explaining the presence/absence of current invaders; those from the inside are, however, more reliable for predicting future invasions. Landscape characteristics determined as crucial predictors from outside the KNP serve as guidelines for management to enact proactive interventions to manipulate landscape features near the KNP to prevent further incursions. Predictors from the inside the KNP can be used reliably to identify high-risk areas to improve the cost-effectiveness of management, to locate invasive plants and target them for eradication.
INTRODUCTION: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. METHODS: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. RESULTS: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. CONCLUSION: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
- Keywords
- cardiogenic shock, classification, machine learning, missing data imputation, prediction model, processing pipeline,
- Publication type
- Journal Article MeSH