PURPOSE: Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity. METHODS: Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity. RESULTS: 25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement. CONCLUSION: Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.
- Publication type
- Journal Article MeSH
Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.
- MeSH
- Speech Acoustics MeSH
- Algorithms MeSH
- Humans MeSH
- Linguistics MeSH
- Likelihood Functions MeSH
- Speech MeSH
- Forensic Sciences * methods MeSH
- Support Vector Machine MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Imbalanced datasets are prominent in real-world problems. In such problems, the data samples in one class are significantly higher than in the other classes, even though the other classes might be more important. The standard classification algorithms may classify all the data into the majority class, and this is a significant drawback of most standard learning algorithms, so imbalanced datasets need to be handled carefully. One of the traditional algorithms, twin support vector machines (TSVM), performed well on balanced data classification but poorly on imbalanced datasets classification. In order to improve the TSVM algorithm's classification ability for imbalanced datasets, recently, driven by the universum twin support vector machine (UTSVM), a reduced universum twin support vector machine for class imbalance learning (RUTSVM) was proposed. The dual problem and finding classifiers involve matrix inverse computation, which is one of RUTSVM's key drawbacks. In this paper, we improve the RUTSVM and propose an improved reduced universum twin support vector machine for class imbalance learning (IRUTSVM). We offer alternative Lagrangian functions to tackle the primal problems of RUTSVM in the suggested IRUTSVM approach by inserting one of the terms in the objective function into the constraints. As a result, we obtain new dual formulation for each optimization problem so that we need not compute inverse matrices neither in the training process nor in finding the classifiers. Moreover, the smaller size of the rectangular kernel matrices is used to reduce the computational time. Extensive testing is carried out on a variety of synthetic and real-world imbalanced datasets, and the findings show that the IRUTSVM algorithm outperforms the TSVM, UTSVM, and RUTSVM algorithms in terms of generalization performance.
- MeSH
- Algorithms * MeSH
- Support Vector Machine * MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Interpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention. METHODS: Data from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub-group later developed squamous cell carcinoma of the oral tongue (SCCOT), were analyzed. Samples from people who developed SCCOT within less than 5 years were classified as tumor-to-be and all other samples as tumor-free. The optimal ML algorithm for feature selection was identified and feature importance computed by the SHapley Additive exPlanations (SHAP) method. Five popular ML algorithms (AdaBoost, Artificial neural networks [ANNs], Decision Tree [DT], eXtreme Gradient Boosting [XGBoost], and Support Vector Machine [SVM]) were applied to establish prediction models, and decisions of the optimal models were interpreted by SHAP. RESULTS: Using the 22 selected features, the SVM prediction model showed the best performance (sensitivity = 0.867, specificity = 0.859, balanced accuracy = 0.863, area under the receiver operating characteristic curve [ROC-AUC] = 0.924). SHAP analysis revealed that the 22 features rendered varying person-specific impacts on model decision and the top three contributors to prediction were Interleukin 10 (IL10), TNF Receptor Associated Factor 2 (TRAF2), and Kallikrein Related Peptidase 12 (KLK12). CONCLUSION: Using multidimensional plasma protein analysis and interpretable ML, we outline a systematic approach for early detection of SCCOT before the appearance of clinical signs.
- MeSH
- Tongue MeSH
- Blood Proteins MeSH
- Humans MeSH
- Tongue Neoplasms * diagnosis MeSH
- Carcinoma, Squamous Cell * diagnosis MeSH
- Machine Learning MeSH
- Ubiquitin-Protein Ligases MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
This paper focuses on non-invasive blood glucose determination using photoplethysmographic (PPG) signals, which is crucial for managing diabetes. Diabetes stands as one of the world’s major chronic diseases. Untreated diabetes frequently leads to fatalities. Current self-monitoring techniques for measuring diabetes require invasive procedures such as blood or bodily fluid sampling, which may be very uncomfortable. Hence, there is an opportunity for non-invasive blood glucose monitoring through smart devices capable of measuring PPG signals. The primary goal of this research was to propose methods for glycemic classification into two groups (low and high glycemia) and to predict specific glycemia values using machine learning techniques. Two datasets were created by measuring PPG signals from 16 individuals using two different smart devices – a smart wristband and a smartphone. Simultaneously, the reference blood glucose levels were invasively measured using a glucometer. The PPG signals were preprocessed, and 27 different features were extracted. With the use of feature selection, only 10 relevant features were chosen. Numerous machine learning models were developed. Random Forest (RF) and Support Vector Machine (SVM) with the radial basis function (RBF) kernel performed best in classifying PPG signals into two groups. These models achieved an accuracy of 76% (SVM) and 75% (RF) on the smart wristband test dataset. The functionality of the proposed models was then verified on the smartphone test dataset, where both models achieved similar accuracy: 74% (SVM) and 75% (RF). For predicting specific glycemia values, RF performed best. Mean Absolute Error (MAE) was 1.25 mmol/l on the smart wristband test dataset and 1.37 mmol/l on the smartphone test dataset.
The search for non-invasive, fast, and low-cost diagnostic tools has gained significant traction among many researchers worldwide. Dielectric properties calculated from microwave signals offer unique insights into biological tissue. Material properties, such as relative permittivity (εr) and conductivity (σ), can vary significantly between healthy and unhealthy tissue types at a given frequency. Understanding this difference in properties is key for identifying the disease state. The frequency-dependent nature of the dielectric measurements results in large datasets, which can be postprocessed using artificial intelligence (AI) methods. In this work, the dielectric properties of liver tissues in three mouse models of liver disease are characterized using dielectric spectroscopy. The measurements are grouped into four categories based on the diets or disease state of the mice, i.e., healthy mice, mice with non-alcoholic steatohepatitis (NASH) induced by choline-deficient high-fat diet, mice with NASH induced by western diet, and mice with liver fibrosis. Multi-class classification machine learning (ML) models are then explored to differentiate the liver tissue groups based on dielectric measurements. The results show that the support vector machine (SVM) model was able to differentiate the tissue groups with an accuracy up to 90%. This technology pipeline, thus, shows great potential for developing the next generation non-invasive diagnostic tools.
- MeSH
- Liver Cirrhosis MeSH
- Liver pathology MeSH
- Mice, Inbred C57BL MeSH
- Mice MeSH
- Non-alcoholic Fatty Liver Disease * diagnosis pathology MeSH
- Machine Learning MeSH
- Artificial Intelligence MeSH
- Animals MeSH
- Check Tag
- Mice MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
Artificial intelligence (AI) is an integral part of clinical decision support systems (CDSS), offering methods to approximate human reasoning and computationally infer decisions. Such methods are generally based on medical knowledge, either directly encoded with rules or automatically extracted from medical data using machine learning (ML). ML techniques, such as Artificial Neural Networks (ANNs) and support vector machines (SVMs), are based on mathematical models with parameters that can be optimally tuned using appropriate algorithms. The ever-increasing computational capacity of today's computer systems enables more complex ML systems with millions of parameters, bringing AI closer to human intelligence. With this objective, the term deep learning (DL) has been introduced to characterize ML based on deep ANN (DNN) architectures with multiple layers of artificial neurons. Despite all of these promises, the impact of AI in current clinical practice is still limited. However, this could change shortly, as the significantly increased papers in AI, machine learning and deep learning in cardiology show. We highlight the significant achievements of recent years in nearly all areas of cardiology and underscore the mounting evidence suggesting how AI will take a central stage in the field.
- Publication type
- Journal Article MeSH
- Review MeSH
BACKGROUND: Atherosclerosis leads to coronary artery disease (CAD) and myocardial infarction (MI), a major cause of morbidity and mortality worldwide. The computer-aided prognosis of atherosclerotic events with the electrocardiogram (ECG) derived heart rate variability (HRV) can be a robust method in the prognosis of atherosclerosis events. METHODS: A total of 70 male subjects aged 55 ± 5 years participated in the study. The lead-II ECG was recorded and sampled at 200 Hz. The tachogram was obtained from the ECG signal and used to extract twenty-five HRV features. The one-way Analysis of variance (ANOVA) test was performed to find the significant differences between the CAD, MI, and control subjects. Features were used in the training and testing of a two-class artificial neural network (ANN) and support vector machine (SVM). RESULTS: The obtained results revealed depressed HRV under atherosclerosis. Accuracy of 100% was obtained in classifying CAD and MI subjects from the controls using ANN. Accuracy was 99.6% with SVM, and in the classification of CAD from MI subjects using SVM and ANN, 99.3% and 99.0% accuracy was obtained respectively. CONCLUSIONS: Depressed HRV has been suggested to be a marker in the identification of atherosclerotic events. The good accuracy observed in classification between control, CAD, and MI subjects, revealed it to be a non-invasive cost-effective approach in the prognosis of atherosclerotic events.
Random Forest is an ensemble of decision trees based on the bagging and random subspace concepts. As suggested by Breiman, the strength of unstable learners and the diversity among them are the ensemble models' core strength. In this paper, we propose two approaches known as oblique and rotation double random forests. In the first approach, we propose rotation based double random forest. In rotation based double random forests, transformation or rotation of the feature space is generated at each node. At each node different random feature subspace is chosen for evaluation, hence the transformation at each node is different. Different transformations result in better diversity among the base learners and hence, better generalization performance. With the double random forest as base learner, the data at each node is transformed via two different transformations namely, principal component analysis and linear discriminant analysis. In the second approach, we propose oblique double random forest. Decision trees in random forest and double random forest are univariate, and this results in the generation of axis parallel split which fails to capture the geometric structure of the data. Also, the standard random forest may not grow sufficiently large decision trees resulting in suboptimal performance. To capture the geometric properties and to grow the decision trees of sufficient depth, we propose oblique double random forest. The oblique double random forest models are multivariate decision trees. At each non-leaf node, multisurface proximal support vector machine generates the optimal plane for better generalization performance. Also, different regularization techniques (Tikhonov regularization, axis-parallel split regularization, Null space regularization) are employed for tackling the small sample size problems in the decision trees of oblique double random forest. The proposed ensembles of decision trees produce trees with bigger size compared to the standard ensembles of decision trees as bagging is used at each non-leaf node which results in improved performance. The evaluation of the baseline models and the proposed oblique and rotation double random forest models is performed on benchmark 121 UCI datasets and real-world fisheries datasets. Both statistical analysis and the experimental results demonstrate the efficacy of the proposed oblique and rotation double random forest models compared to the baseline models on the benchmark datasets.
- MeSH
- Algorithms * MeSH
- Principal Component Analysis MeSH
- Rotation MeSH
- Support Vector Machine * MeSH
- Publication type
- Journal Article MeSH
Fragmented QRS (fQRS) is an electrocardiographic (ECG) marker of myocardial conduction abnormality, characterized by additional notches in the QRS complex. The presence of fQRS has been associated with an increased risk of all-cause mortality and arrhythmia in patients with cardiovascular disease. However, current binary visual analysis is prone to intra- and inter-observer variability and different definitions are problematic in clinical practice. Therefore, objective quantification of fQRS is needed and could further improve risk stratification of these patients. We present an automated method for fQRS detection and quantification. First, a novel robust QRS complex segmentation strategy is proposed, which combines multi-lead information and excludes abnormal heartbeats automatically. Afterwards extracted features, based on variational mode decomposition (VMD), phase-rectified signal averaging (PRSA) and the number of baseline-crossings of the ECG, were used to train a machine learning classifier (Support Vector Machine) to discriminate fragmented from non-fragmented ECG-traces using multi-center data and combining different fQRS criteria used in clinical settings. The best model was trained on the combination of two independent previously annotated datasets and, compared to these visual fQRS annotations, achieved Kappa scores of 0.68 and 0.44, respectively. We also show that the algorithm might be used in both regular sinus rhythm and irregular beats during atrial fibrillation. These results demonstrate that the proposed approach could be relevant for clinical practice by objectively assessing and quantifying fQRS. The study sets the path for further clinical application of the developed automated fQRS algorithm.
- MeSH
- Algorithms MeSH
- Electrocardiography * methods MeSH
- Atrial Fibrillation * diagnosis MeSH
- Humans MeSH
- Machine Learning MeSH
- Support Vector Machine MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH