SVM, Support Vector Machine
Dotaz
Zobrazit nápovědu
Flow cytometry immunophenotyping is critical for the diagnostic classification of mature/peripheral B-cell neoplasms/B-cell chronic lymphoproliferative disorders (B-CLPD). Quantitative driven classification approaches applied to multiparameter flow cytometry immunophenotypic data can be used to extract maximum information from a multidimensional space created by individual parameters (e.g., immunophenotypic markers), for highly accurate and automated classification of individual patient (sample) data. Here, we developed and compared five diagnostic classification algorithms, based on a large set of EuroFlow multicentric flow cytometry data files from a cohort 659 B-CLPD patients. These included automatic population separators based on Principal Component Analysis (PCA), Canonical Variate Analysis (CVA), Neighbourhood Component Analysis (NCA), Support Vector Machine algorithms (SVM) and a variant of the CA(Canonical Analysis) algorithm, in which the number of SDs (Standard Deviations) varied for each of the comparisons of different pairs of diseases (CA-vSD). All five classification approaches are based on direct prospective interrogation of individual B-CLPD patients against the EuroFlow flow cytometry B-CLPD database composed of tumor B-cells of 659 individual patients stained in an identical way and classified a priori by the World Health Organization (WHO) criteria into nine diagnostic categories. Each classification approach was evaluated in parallel in terms of accuracy (% properly classified cases), precision (multiple or single diagnosis/case) and coverage (% cases with a proposed diagnosis). Overall, average rates of correct diagnosis (for the nine B-CLPD diagnostic entities) of between 58.9 % and 90.6 % were obtained with the five algorithms, with variable percentages of cases being either misclassified (4.1 %-14.0 %) or unclassifiable (0.3 %-37.0 %). Automatic population separators based on CA, SVM and PCA showed a high average level of correctness (90.6 %, 86.8 %, and 86.0 %, respectively). Nevertheless, this was at the expense of proposing a considerable number of multiple diagnoses for a significant proportion of the test cases (54.5 %, 53.5 %, and 49.6 %, respectively). The CA-vSD algorithm generated the smaller average misclassification rate (4.1 %), but with 37.0 % of cases for which no diagnosis was proposed. In contrast, the NCA algorithm left only 2.7 % of cases without an associated diagnosis but misclassified 14.0 %. Among correctly classified cases (83.3 % of total), 91.2 % had a single proposed diagnosis, 8.6 % had two possible diagnoses, and 0.2 % had three. We demonstrate that the proposed AI algorithms provide an acceptable level of accuracy for the diagnostic classification of B-CLPD patients and, in general, surpass other algorithms reported in the literature.
- MeSH
- algoritmy MeSH
- B-lymfocyty * patologie MeSH
- imunofenotypizace * metody MeSH
- lidé středního věku MeSH
- lidé MeSH
- lymfoproliferativní nemoci * diagnóza klasifikace MeSH
- průtoková cytometrie * metody MeSH
- senioři MeSH
- support vector machine MeSH
- Check Tag
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH
Long QT syndrome (LQTS) presents a group of inheritable channelopathies with prolonged ventricular repolarization, leading to syncope, ventricular tachycardia, and sudden death. Differentiating LQTS genotypes is crucial for targeted management and treatment, yet conventional genetic testing remains costly and time-consuming. This study aims to improve the distinction between LQTS genotypes, particularly LQT3, through a novel electrocardiogram (ECG)-based approach. Patients with LQT3 are at elevated risk due to arrhythmia triggers associated with rest and sleep. Employing a database of genotyped long QT syndrome E-HOL-03-0480-013 ECG signals, we introduced two innovative parameterization techniques-area under the ECG curve and wave transformation into the unit circle-to classify LQT3 against LQT1 and LQT2 genotypes. Our methodology utilized single-lead ECG data with a 200 Hz sampling frequency. The support vector machine (SVM) model demonstrated the ability to discriminate LQT3 with a recall of 90% and a precision of 81%, achieving an F1-score of 0.85. This parameterization offers a potential substitute for genetic testing and is practical for low frequencies. These single-lead ECG data could enhance smartwatches' functionality and similar cardiovascular monitoring applications. The results underscore the viability of ECG morphology-based genotype classification, promising a significant step towards streamlined diagnosis and improved patient care in LQTS.
- MeSH
- dospělí MeSH
- elektrokardiografie * metody MeSH
- genotyp MeSH
- lidé MeSH
- strojové učení * MeSH
- support vector machine MeSH
- syndrom dlouhého QT * genetika diagnóza patofyziologie MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH
Early detection of malignant thyroid nodules is crucial for effective treatment, but traditional diagnostic methods face challenges such as variability in expert opinions and limited integration of advanced imaging techniques. This prospective cohort study investigates a novel multimodal approach, integrating traditional methods with advanced machine learning techniques. We studied 181 patients who underwent fine-needle aspiration (FNA) biopsy, each contributing one nodule, resulting in a total of 181 nodules for our analysis. Data collection included sex, age, and ultrasound imaging, which incorporated elastography. Features extracted from these images included Thyroid Imaging Reporting and Data System (TIRADS) scores, elastography parameters, and radiomic features. The pathological results based on the FNA biopsy, provided by the pathologists, served as our gold standard for nodule classification. Our methodology, termed ELTIRADS, combines these features with interpretable machine learning techniques. Performance evaluation showed that a Support Vector Machine (SVM) classifier using TIRADS, elastography data, and radiomic features achieved high accuracy (0.92), with sensitivity (0.89), specificity (0.94), precision (0.89), and F1 score (0.89). To enhance interpretability, we used hierarchical clustering, shapley additive explanations (SHAP), and partial dependence plots (PDP). This combined approach holds promise for enhancing the accuracy of thyroid nodule malignancy detection, thereby contributing to advancements in personalized and precision medicine in the field of thyroid cancer research.
- MeSH
- dospělí MeSH
- elastografie * metody MeSH
- lidé středního věku MeSH
- lidé MeSH
- nádory štítné žlázy diagnostické zobrazování klasifikace patologie diagnóza MeSH
- prospektivní studie MeSH
- radiomika MeSH
- senioři MeSH
- štítná žláza diagnostické zobrazování patologie MeSH
- strojové učení * MeSH
- support vector machine MeSH
- tenkojehlová biopsie MeSH
- uzly štítné žlázy * diagnostické zobrazování patologie klasifikace MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Advancements in artificial intelligence (AI) and machine learning (ML) have revolutionized the medical field and transformed translational medicine. These technologies enable more accurate disease trajectory models while enhancing patient-centered care. However, challenges such as heterogeneous datasets, class imbalance, and scalability remain barriers to achieving optimal predictive performance. METHODS: This study proposes a novel AI-based framework that integrates Gradient Boosting Machines (GBM) and Deep Neural Networks (DNN) to address these challenges. The framework was evaluated using two distinct datasets: MIMIC-IV, a critical care database containing clinical data of critically ill patients, and the UK Biobank, which comprises genetic, clinical, and lifestyle data from 500,000 participants. Key performance metrics, including Accuracy, Precision, Recall, F1-Score, and AUROC, were used to assess the framework against traditional and advanced ML models. RESULTS: The proposed framework demonstrated superior performance compared to classical models such as Logistic Regression, Random Forest, Support Vector Machines (SVM), and Neural Networks. For example, on the UK Biobank dataset, the model achieved an AUROC of 0.96, significantly outperforming Neural Networks (0.92). The framework was also efficient, requiring only 32.4 s for training on MIMIC-IV, with low prediction latency, making it suitable for real-time applications. CONCLUSIONS: The proposed AI-based framework effectively addresses critical challenges in translational medicine, offering superior predictive accuracy and efficiency. Its robust performance across diverse datasets highlights its potential for integration into real-time clinical decision support systems, facilitating personalized medicine and improving patient outcomes. Future research will focus on enhancing scalability and interpretability for broader clinical applications.
PURPOSE: Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity. METHODS: Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity. RESULTS: 25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement. CONCLUSION: Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.
- Publikační typ
- časopisecké články MeSH
Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.
- MeSH
- akustika řeči MeSH
- algoritmy MeSH
- lidé MeSH
- lingvistika MeSH
- pravděpodobnostní funkce MeSH
- řeč MeSH
- soudní vědy * metody MeSH
- support vector machine MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Imbalanced datasets are prominent in real-world problems. In such problems, the data samples in one class are significantly higher than in the other classes, even though the other classes might be more important. The standard classification algorithms may classify all the data into the majority class, and this is a significant drawback of most standard learning algorithms, so imbalanced datasets need to be handled carefully. One of the traditional algorithms, twin support vector machines (TSVM), performed well on balanced data classification but poorly on imbalanced datasets classification. In order to improve the TSVM algorithm's classification ability for imbalanced datasets, recently, driven by the universum twin support vector machine (UTSVM), a reduced universum twin support vector machine for class imbalance learning (RUTSVM) was proposed. The dual problem and finding classifiers involve matrix inverse computation, which is one of RUTSVM's key drawbacks. In this paper, we improve the RUTSVM and propose an improved reduced universum twin support vector machine for class imbalance learning (IRUTSVM). We offer alternative Lagrangian functions to tackle the primal problems of RUTSVM in the suggested IRUTSVM approach by inserting one of the terms in the objective function into the constraints. As a result, we obtain new dual formulation for each optimization problem so that we need not compute inverse matrices neither in the training process nor in finding the classifiers. Moreover, the smaller size of the rectangular kernel matrices is used to reduce the computational time. Extensive testing is carried out on a variety of synthetic and real-world imbalanced datasets, and the findings show that the IRUTSVM algorithm outperforms the TSVM, UTSVM, and RUTSVM algorithms in terms of generalization performance.
- MeSH
- algoritmy * MeSH
- support vector machine * MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Interpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention. METHODS: Data from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub-group later developed squamous cell carcinoma of the oral tongue (SCCOT), were analyzed. Samples from people who developed SCCOT within less than 5 years were classified as tumor-to-be and all other samples as tumor-free. The optimal ML algorithm for feature selection was identified and feature importance computed by the SHapley Additive exPlanations (SHAP) method. Five popular ML algorithms (AdaBoost, Artificial neural networks [ANNs], Decision Tree [DT], eXtreme Gradient Boosting [XGBoost], and Support Vector Machine [SVM]) were applied to establish prediction models, and decisions of the optimal models were interpreted by SHAP. RESULTS: Using the 22 selected features, the SVM prediction model showed the best performance (sensitivity = 0.867, specificity = 0.859, balanced accuracy = 0.863, area under the receiver operating characteristic curve [ROC-AUC] = 0.924). SHAP analysis revealed that the 22 features rendered varying person-specific impacts on model decision and the top three contributors to prediction were Interleukin 10 (IL10), TNF Receptor Associated Factor 2 (TRAF2), and Kallikrein Related Peptidase 12 (KLK12). CONCLUSION: Using multidimensional plasma protein analysis and interpretable ML, we outline a systematic approach for early detection of SCCOT before the appearance of clinical signs.
- MeSH
- jazyk MeSH
- krevní proteiny MeSH
- lidé MeSH
- nádory jazyka * diagnóza MeSH
- spinocelulární karcinom * diagnóza MeSH
- strojové učení MeSH
- ubikvitinligasy MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
This paper focuses on non-invasive blood glucose determination using photoplethysmographic (PPG) signals, which is crucial for managing diabetes. Diabetes stands as one of the world’s major chronic diseases. Untreated diabetes frequently leads to fatalities. Current self-monitoring techniques for measuring diabetes require invasive procedures such as blood or bodily fluid sampling, which may be very uncomfortable. Hence, there is an opportunity for non-invasive blood glucose monitoring through smart devices capable of measuring PPG signals. The primary goal of this research was to propose methods for glycemic classification into two groups (low and high glycemia) and to predict specific glycemia values using machine learning techniques. Two datasets were created by measuring PPG signals from 16 individuals using two different smart devices – a smart wristband and a smartphone. Simultaneously, the reference blood glucose levels were invasively measured using a glucometer. The PPG signals were preprocessed, and 27 different features were extracted. With the use of feature selection, only 10 relevant features were chosen. Numerous machine learning models were developed. Random Forest (RF) and Support Vector Machine (SVM) with the radial basis function (RBF) kernel performed best in classifying PPG signals into two groups. These models achieved an accuracy of 76% (SVM) and 75% (RF) on the smart wristband test dataset. The functionality of the proposed models was then verified on the smartphone test dataset, where both models achieved similar accuracy: 74% (SVM) and 75% (RF). For predicting specific glycemia values, RF performed best. Mean Absolute Error (MAE) was 1.25 mmol/l on the smart wristband test dataset and 1.37 mmol/l on the smartphone test dataset.
The search for non-invasive, fast, and low-cost diagnostic tools has gained significant traction among many researchers worldwide. Dielectric properties calculated from microwave signals offer unique insights into biological tissue. Material properties, such as relative permittivity (εr) and conductivity (σ), can vary significantly between healthy and unhealthy tissue types at a given frequency. Understanding this difference in properties is key for identifying the disease state. The frequency-dependent nature of the dielectric measurements results in large datasets, which can be postprocessed using artificial intelligence (AI) methods. In this work, the dielectric properties of liver tissues in three mouse models of liver disease are characterized using dielectric spectroscopy. The measurements are grouped into four categories based on the diets or disease state of the mice, i.e., healthy mice, mice with non-alcoholic steatohepatitis (NASH) induced by choline-deficient high-fat diet, mice with NASH induced by western diet, and mice with liver fibrosis. Multi-class classification machine learning (ML) models are then explored to differentiate the liver tissue groups based on dielectric measurements. The results show that the support vector machine (SVM) model was able to differentiate the tissue groups with an accuracy up to 90%. This technology pipeline, thus, shows great potential for developing the next generation non-invasive diagnostic tools.
- MeSH
- jaterní cirhóza MeSH
- játra patologie MeSH
- myši inbrední C57BL MeSH
- myši MeSH
- nealkoholová steatóza jater * diagnóza patologie MeSH
- strojové učení MeSH
- umělá inteligence MeSH
- zvířata MeSH
- Check Tag
- myši MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH