Flow cytometry immunophenotyping is critical for the diagnostic classification of mature/peripheral B-cell neoplasms/B-cell chronic lymphoproliferative disorders (B-CLPD). Quantitative driven classification approaches applied to multiparameter flow cytometry immunophenotypic data can be used to extract maximum information from a multidimensional space created by individual parameters (e.g., immunophenotypic markers), for highly accurate and automated classification of individual patient (sample) data. Here, we developed and compared five diagnostic classification algorithms, based on a large set of EuroFlow multicentric flow cytometry data files from a cohort 659 B-CLPD patients. These included automatic population separators based on Principal Component Analysis (PCA), Canonical Variate Analysis (CVA), Neighbourhood Component Analysis (NCA), Support Vector Machine algorithms (SVM) and a variant of the CA(Canonical Analysis) algorithm, in which the number of SDs (Standard Deviations) varied for each of the comparisons of different pairs of diseases (CA-vSD). All five classification approaches are based on direct prospective interrogation of individual B-CLPD patients against the EuroFlow flow cytometry B-CLPD database composed of tumor B-cells of 659 individual patients stained in an identical way and classified a priori by the World Health Organization (WHO) criteria into nine diagnostic categories. Each classification approach was evaluated in parallel in terms of accuracy (% properly classified cases), precision (multiple or single diagnosis/case) and coverage (% cases with a proposed diagnosis). Overall, average rates of correct diagnosis (for the nine B-CLPD diagnostic entities) of between 58.9 % and 90.6 % were obtained with the five algorithms, with variable percentages of cases being either misclassified (4.1 %-14.0 %) or unclassifiable (0.3 %-37.0 %). Automatic population separators based on CA, SVM and PCA showed a high average level of correctness (90.6 %, 86.8 %, and 86.0 %, respectively). Nevertheless, this was at the expense of proposing a considerable number of multiple diagnoses for a significant proportion of the test cases (54.5 %, 53.5 %, and 49.6 %, respectively). The CA-vSD algorithm generated the smaller average misclassification rate (4.1 %), but with 37.0 % of cases for which no diagnosis was proposed. In contrast, the NCA algorithm left only 2.7 % of cases without an associated diagnosis but misclassified 14.0 %. Among correctly classified cases (83.3 % of total), 91.2 % had a single proposed diagnosis, 8.6 % had two possible diagnoses, and 0.2 % had three. We demonstrate that the proposed AI algorithms provide an acceptable level of accuracy for the diagnostic classification of B-CLPD patients and, in general, surpass other algorithms reported in the literature.
- MeSH
- Algorithms MeSH
- B-Lymphocytes * pathology MeSH
- Immunophenotyping * methods MeSH
- Middle Aged MeSH
- Humans MeSH
- Lymphoproliferative Disorders * diagnosis classification MeSH
- Flow Cytometry * methods MeSH
- Aged MeSH
- Support Vector Machine MeSH
- Check Tag
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Comparative Study MeSH
Long QT syndrome (LQTS) presents a group of inheritable channelopathies with prolonged ventricular repolarization, leading to syncope, ventricular tachycardia, and sudden death. Differentiating LQTS genotypes is crucial for targeted management and treatment, yet conventional genetic testing remains costly and time-consuming. This study aims to improve the distinction between LQTS genotypes, particularly LQT3, through a novel electrocardiogram (ECG)-based approach. Patients with LQT3 are at elevated risk due to arrhythmia triggers associated with rest and sleep. Employing a database of genotyped long QT syndrome E-HOL-03-0480-013 ECG signals, we introduced two innovative parameterization techniques-area under the ECG curve and wave transformation into the unit circle-to classify LQT3 against LQT1 and LQT2 genotypes. Our methodology utilized single-lead ECG data with a 200 Hz sampling frequency. The support vector machine (SVM) model demonstrated the ability to discriminate LQT3 with a recall of 90% and a precision of 81%, achieving an F1-score of 0.85. This parameterization offers a potential substitute for genetic testing and is practical for low frequencies. These single-lead ECG data could enhance smartwatches' functionality and similar cardiovascular monitoring applications. The results underscore the viability of ECG morphology-based genotype classification, promising a significant step towards streamlined diagnosis and improved patient care in LQTS.
- MeSH
- Adult MeSH
- Electrocardiography * methods MeSH
- Genotype MeSH
- Humans MeSH
- Machine Learning * MeSH
- Support Vector Machine MeSH
- Long QT Syndrome * genetics diagnosis physiopathology MeSH
- Check Tag
- Adult MeSH
- Humans MeSH
- Male MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Comparative Study MeSH
Early detection of malignant thyroid nodules is crucial for effective treatment, but traditional diagnostic methods face challenges such as variability in expert opinions and limited integration of advanced imaging techniques. This prospective cohort study investigates a novel multimodal approach, integrating traditional methods with advanced machine learning techniques. We studied 181 patients who underwent fine-needle aspiration (FNA) biopsy, each contributing one nodule, resulting in a total of 181 nodules for our analysis. Data collection included sex, age, and ultrasound imaging, which incorporated elastography. Features extracted from these images included Thyroid Imaging Reporting and Data System (TIRADS) scores, elastography parameters, and radiomic features. The pathological results based on the FNA biopsy, provided by the pathologists, served as our gold standard for nodule classification. Our methodology, termed ELTIRADS, combines these features with interpretable machine learning techniques. Performance evaluation showed that a Support Vector Machine (SVM) classifier using TIRADS, elastography data, and radiomic features achieved high accuracy (0.92), with sensitivity (0.89), specificity (0.94), precision (0.89), and F1 score (0.89). To enhance interpretability, we used hierarchical clustering, shapley additive explanations (SHAP), and partial dependence plots (PDP). This combined approach holds promise for enhancing the accuracy of thyroid nodule malignancy detection, thereby contributing to advancements in personalized and precision medicine in the field of thyroid cancer research.
- MeSH
- Adult MeSH
- Elasticity Imaging Techniques * methods MeSH
- Middle Aged MeSH
- Humans MeSH
- Thyroid Neoplasms diagnostic imaging classification pathology diagnosis MeSH
- Prospective Studies MeSH
- Radiomics MeSH
- Aged MeSH
- Thyroid Gland diagnostic imaging pathology MeSH
- Machine Learning * MeSH
- Support Vector Machine MeSH
- Biopsy, Fine-Needle MeSH
- Thyroid Nodule * diagnostic imaging pathology classification MeSH
- Check Tag
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Advancements in artificial intelligence (AI) and machine learning (ML) have revolutionized the medical field and transformed translational medicine. These technologies enable more accurate disease trajectory models while enhancing patient-centered care. However, challenges such as heterogeneous datasets, class imbalance, and scalability remain barriers to achieving optimal predictive performance. METHODS: This study proposes a novel AI-based framework that integrates Gradient Boosting Machines (GBM) and Deep Neural Networks (DNN) to address these challenges. The framework was evaluated using two distinct datasets: MIMIC-IV, a critical care database containing clinical data of critically ill patients, and the UK Biobank, which comprises genetic, clinical, and lifestyle data from 500,000 participants. Key performance metrics, including Accuracy, Precision, Recall, F1-Score, and AUROC, were used to assess the framework against traditional and advanced ML models. RESULTS: The proposed framework demonstrated superior performance compared to classical models such as Logistic Regression, Random Forest, Support Vector Machines (SVM), and Neural Networks. For example, on the UK Biobank dataset, the model achieved an AUROC of 0.96, significantly outperforming Neural Networks (0.92). The framework was also efficient, requiring only 32.4 s for training on MIMIC-IV, with low prediction latency, making it suitable for real-time applications. CONCLUSIONS: The proposed AI-based framework effectively addresses critical challenges in translational medicine, offering superior predictive accuracy and efficiency. Its robust performance across diverse datasets highlights its potential for integration into real-time clinical decision support systems, facilitating personalized medicine and improving patient outcomes. Future research will focus on enhancing scalability and interpretability for broader clinical applications.
- MeSH
- Databases, Factual MeSH
- Humans MeSH
- Neural Networks, Computer MeSH
- Patient-Centered Care * MeSH
- Machine Learning * MeSH
- Translational Science, Biomedical MeSH
- Translational Research, Biomedical MeSH
- Artificial Intelligence * MeSH
- Treatment Outcome MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
PURPOSE: Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity. METHODS: Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity. RESULTS: 25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement. CONCLUSION: Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.
- Publication type
- Journal Article MeSH
T-lineage acute lymphoblastic leukemia (T-ALL) accounts for about 15% of pediatric and about 25% of adult ALL cases. Minimal/measurable residual disease (MRD) assessed by flow cytometry (FCM) is an important prognostic indicator for risk stratification. In order to assess the MRD a limited number of antibodies directed against the most discriminative antigens must be selected. We propose a pipeline for evaluating the influence of different markers for cell population classification in FCM data. We use linear support vector machine, fitted to each sample individually to avoid issues with patient and laboratory variations. The best separating hyperplane direction as well as the influence of omitting specific markers is considered. Ninety-one bone marrow samples of 43 pediatric T-ALL patients from five reference laboratories were analyzed by FCM regarding marker importance for blast cell identification using combinations of eight different markers. For all laboratories, CD48 and CD99 were among the top three markers with strongest contribution to the optimal hyperplane, measured by median separating hyperplane coefficient size for all samples per center and time point (diagnosis, Day 15, Day 33). Based on the available limited set tested (CD3, CD4, CD5, CD7, CD8, CD45, CD48, CD99), our findings prove that CD48 and CD99 are useful markers for MRD monitoring in T-ALL. The proposed pipeline can be applied for evaluation of other marker combinations in the future.
- MeSH
- Precursor Cell Lymphoblastic Leukemia-Lymphoma * diagnosis MeSH
- Child MeSH
- Adult MeSH
- Humans MeSH
- Precursor T-Cell Lymphoblastic Leukemia-Lymphoma * diagnosis MeSH
- Flow Cytometry MeSH
- Neoplasm, Residual diagnosis MeSH
- T-Lymphocytes MeSH
- Check Tag
- Child MeSH
- Adult MeSH
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.
- MeSH
- Speech Acoustics MeSH
- Algorithms MeSH
- Humans MeSH
- Linguistics MeSH
- Likelihood Functions MeSH
- Speech MeSH
- Forensic Sciences * methods MeSH
- Support Vector Machine MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Motor disability is a dominant and restricting symptom in multiple sclerosis, yet its neuroimaging correlates are not fully understood. We apply statistical and machine learning techniques on multimodal neuroimaging data to discriminate between multiple sclerosis patients and healthy controls and to predict motor disability scores in the patients. We examine the data of sixty-four multiple sclerosis patients and sixty-five controls, who underwent the MRI examination and the evaluation of motor disability scales. The modalities used comprised regional fractional anisotropy, regional grey matter volumes, and functional connectivity. For analysis, we employ two approaches: high-dimensional support vector machines run on features selected by Fisher Score (aiming for maximal classification accuracy), and low-dimensional logistic regression on the principal components of data (aiming for increased interpretability). We apply analogous regression methods to predict symptom severity. While fractional anisotropy provides the classification accuracy of 96.1% and 89.9% with both approaches respectively, including other modalities did not bring further improvement. Concerning the prediction of motor impairment, the low-dimensional approach performed more reliably. The first grey matter volume component was significantly correlated (R = 0.28-0.46, p < 0.05) with most clinical scales. In summary, we identified the relationship between both white and grey matter changes and motor impairment in multiple sclerosis. Furthermore, we were able to achieve the highest classification accuracy based on quantitative MRI measures of tissue integrity between patients and controls yet reported, while also providing a low-dimensional classification approach with comparable results, paving the way to interpretable machine learning models of brain changes in multiple sclerosis.
Imbalanced datasets are prominent in real-world problems. In such problems, the data samples in one class are significantly higher than in the other classes, even though the other classes might be more important. The standard classification algorithms may classify all the data into the majority class, and this is a significant drawback of most standard learning algorithms, so imbalanced datasets need to be handled carefully. One of the traditional algorithms, twin support vector machines (TSVM), performed well on balanced data classification but poorly on imbalanced datasets classification. In order to improve the TSVM algorithm's classification ability for imbalanced datasets, recently, driven by the universum twin support vector machine (UTSVM), a reduced universum twin support vector machine for class imbalance learning (RUTSVM) was proposed. The dual problem and finding classifiers involve matrix inverse computation, which is one of RUTSVM's key drawbacks. In this paper, we improve the RUTSVM and propose an improved reduced universum twin support vector machine for class imbalance learning (IRUTSVM). We offer alternative Lagrangian functions to tackle the primal problems of RUTSVM in the suggested IRUTSVM approach by inserting one of the terms in the objective function into the constraints. As a result, we obtain new dual formulation for each optimization problem so that we need not compute inverse matrices neither in the training process nor in finding the classifiers. Moreover, the smaller size of the rectangular kernel matrices is used to reduce the computational time. Extensive testing is carried out on a variety of synthetic and real-world imbalanced datasets, and the findings show that the IRUTSVM algorithm outperforms the TSVM, UTSVM, and RUTSVM algorithms in terms of generalization performance.
- MeSH
- Algorithms * MeSH
- Support Vector Machine * MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Interpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention. METHODS: Data from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub-group later developed squamous cell carcinoma of the oral tongue (SCCOT), were analyzed. Samples from people who developed SCCOT within less than 5 years were classified as tumor-to-be and all other samples as tumor-free. The optimal ML algorithm for feature selection was identified and feature importance computed by the SHapley Additive exPlanations (SHAP) method. Five popular ML algorithms (AdaBoost, Artificial neural networks [ANNs], Decision Tree [DT], eXtreme Gradient Boosting [XGBoost], and Support Vector Machine [SVM]) were applied to establish prediction models, and decisions of the optimal models were interpreted by SHAP. RESULTS: Using the 22 selected features, the SVM prediction model showed the best performance (sensitivity = 0.867, specificity = 0.859, balanced accuracy = 0.863, area under the receiver operating characteristic curve [ROC-AUC] = 0.924). SHAP analysis revealed that the 22 features rendered varying person-specific impacts on model decision and the top three contributors to prediction were Interleukin 10 (IL10), TNF Receptor Associated Factor 2 (TRAF2), and Kallikrein Related Peptidase 12 (KLK12). CONCLUSION: Using multidimensional plasma protein analysis and interpretable ML, we outline a systematic approach for early detection of SCCOT before the appearance of clinical signs.
- MeSH
- Tongue MeSH
- Blood Proteins MeSH
- Humans MeSH
- Tongue Neoplasms * diagnosis MeSH
- Carcinoma, Squamous Cell * diagnosis MeSH
- Machine Learning MeSH
- Ubiquitin-Protein Ligases MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH