support vector machines
Dotaz
Zobrazit nápovědu
Early detection of malignant thyroid nodules is crucial for effective treatment, but traditional diagnostic methods face challenges such as variability in expert opinions and limited integration of advanced imaging techniques. This prospective cohort study investigates a novel multimodal approach, integrating traditional methods with advanced machine learning techniques. We studied 181 patients who underwent fine-needle aspiration (FNA) biopsy, each contributing one nodule, resulting in a total of 181 nodules for our analysis. Data collection included sex, age, and ultrasound imaging, which incorporated elastography. Features extracted from these images included Thyroid Imaging Reporting and Data System (TIRADS) scores, elastography parameters, and radiomic features. The pathological results based on the FNA biopsy, provided by the pathologists, served as our gold standard for nodule classification. Our methodology, termed ELTIRADS, combines these features with interpretable machine learning techniques. Performance evaluation showed that a Support Vector Machine (SVM) classifier using TIRADS, elastography data, and radiomic features achieved high accuracy (0.92), with sensitivity (0.89), specificity (0.94), precision (0.89), and F1 score (0.89). To enhance interpretability, we used hierarchical clustering, shapley additive explanations (SHAP), and partial dependence plots (PDP). This combined approach holds promise for enhancing the accuracy of thyroid nodule malignancy detection, thereby contributing to advancements in personalized and precision medicine in the field of thyroid cancer research.
- MeSH
- dospělí MeSH
- elastografie * metody MeSH
- lidé středního věku MeSH
- lidé MeSH
- nádory štítné žlázy diagnostické zobrazování klasifikace patologie diagnóza MeSH
- prospektivní studie MeSH
- radiomika MeSH
- senioři MeSH
- štítná žláza diagnostické zobrazování patologie MeSH
- strojové učení * MeSH
- support vector machine MeSH
- tenkojehlová biopsie MeSH
- uzly štítné žlázy * diagnostické zobrazování patologie klasifikace MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Advancements in artificial intelligence (AI) and machine learning (ML) have revolutionized the medical field and transformed translational medicine. These technologies enable more accurate disease trajectory models while enhancing patient-centered care. However, challenges such as heterogeneous datasets, class imbalance, and scalability remain barriers to achieving optimal predictive performance. METHODS: This study proposes a novel AI-based framework that integrates Gradient Boosting Machines (GBM) and Deep Neural Networks (DNN) to address these challenges. The framework was evaluated using two distinct datasets: MIMIC-IV, a critical care database containing clinical data of critically ill patients, and the UK Biobank, which comprises genetic, clinical, and lifestyle data from 500,000 participants. Key performance metrics, including Accuracy, Precision, Recall, F1-Score, and AUROC, were used to assess the framework against traditional and advanced ML models. RESULTS: The proposed framework demonstrated superior performance compared to classical models such as Logistic Regression, Random Forest, Support Vector Machines (SVM), and Neural Networks. For example, on the UK Biobank dataset, the model achieved an AUROC of 0.96, significantly outperforming Neural Networks (0.92). The framework was also efficient, requiring only 32.4 s for training on MIMIC-IV, with low prediction latency, making it suitable for real-time applications. CONCLUSIONS: The proposed AI-based framework effectively addresses critical challenges in translational medicine, offering superior predictive accuracy and efficiency. Its robust performance across diverse datasets highlights its potential for integration into real-time clinical decision support systems, facilitating personalized medicine and improving patient outcomes. Future research will focus on enhancing scalability and interpretability for broader clinical applications.
PURPOSE: Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity. METHODS: Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity. RESULTS: 25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement. CONCLUSION: Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.
- Publikační typ
- časopisecké články MeSH
T-lineage acute lymphoblastic leukemia (T-ALL) accounts for about 15% of pediatric and about 25% of adult ALL cases. Minimal/measurable residual disease (MRD) assessed by flow cytometry (FCM) is an important prognostic indicator for risk stratification. In order to assess the MRD a limited number of antibodies directed against the most discriminative antigens must be selected. We propose a pipeline for evaluating the influence of different markers for cell population classification in FCM data. We use linear support vector machine, fitted to each sample individually to avoid issues with patient and laboratory variations. The best separating hyperplane direction as well as the influence of omitting specific markers is considered. Ninety-one bone marrow samples of 43 pediatric T-ALL patients from five reference laboratories were analyzed by FCM regarding marker importance for blast cell identification using combinations of eight different markers. For all laboratories, CD48 and CD99 were among the top three markers with strongest contribution to the optimal hyperplane, measured by median separating hyperplane coefficient size for all samples per center and time point (diagnosis, Day 15, Day 33). Based on the available limited set tested (CD3, CD4, CD5, CD7, CD8, CD45, CD48, CD99), our findings prove that CD48 and CD99 are useful markers for MRD monitoring in T-ALL. The proposed pipeline can be applied for evaluation of other marker combinations in the future.
- MeSH
- akutní lymfatická leukemie * diagnóza MeSH
- dítě MeSH
- dospělí MeSH
- lidé MeSH
- lymfoblastická leukemie-lymfom z prekurzorových T-buněk * diagnóza MeSH
- průtoková cytometrie MeSH
- reziduální nádor diagnóza MeSH
- T-lymfocyty MeSH
- Check Tag
- dítě MeSH
- dospělí MeSH
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.
- MeSH
- akustika řeči MeSH
- algoritmy MeSH
- lidé MeSH
- lingvistika MeSH
- pravděpodobnostní funkce MeSH
- řeč MeSH
- soudní vědy * metody MeSH
- support vector machine MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Motor disability is a dominant and restricting symptom in multiple sclerosis, yet its neuroimaging correlates are not fully understood. We apply statistical and machine learning techniques on multimodal neuroimaging data to discriminate between multiple sclerosis patients and healthy controls and to predict motor disability scores in the patients. We examine the data of sixty-four multiple sclerosis patients and sixty-five controls, who underwent the MRI examination and the evaluation of motor disability scales. The modalities used comprised regional fractional anisotropy, regional grey matter volumes, and functional connectivity. For analysis, we employ two approaches: high-dimensional support vector machines run on features selected by Fisher Score (aiming for maximal classification accuracy), and low-dimensional logistic regression on the principal components of data (aiming for increased interpretability). We apply analogous regression methods to predict symptom severity. While fractional anisotropy provides the classification accuracy of 96.1% and 89.9% with both approaches respectively, including other modalities did not bring further improvement. Concerning the prediction of motor impairment, the low-dimensional approach performed more reliably. The first grey matter volume component was significantly correlated (R = 0.28-0.46, p < 0.05) with most clinical scales. In summary, we identified the relationship between both white and grey matter changes and motor impairment in multiple sclerosis. Furthermore, we were able to achieve the highest classification accuracy based on quantitative MRI measures of tissue integrity between patients and controls yet reported, while also providing a low-dimensional classification approach with comparable results, paving the way to interpretable machine learning models of brain changes in multiple sclerosis.
Imbalanced datasets are prominent in real-world problems. In such problems, the data samples in one class are significantly higher than in the other classes, even though the other classes might be more important. The standard classification algorithms may classify all the data into the majority class, and this is a significant drawback of most standard learning algorithms, so imbalanced datasets need to be handled carefully. One of the traditional algorithms, twin support vector machines (TSVM), performed well on balanced data classification but poorly on imbalanced datasets classification. In order to improve the TSVM algorithm's classification ability for imbalanced datasets, recently, driven by the universum twin support vector machine (UTSVM), a reduced universum twin support vector machine for class imbalance learning (RUTSVM) was proposed. The dual problem and finding classifiers involve matrix inverse computation, which is one of RUTSVM's key drawbacks. In this paper, we improve the RUTSVM and propose an improved reduced universum twin support vector machine for class imbalance learning (IRUTSVM). We offer alternative Lagrangian functions to tackle the primal problems of RUTSVM in the suggested IRUTSVM approach by inserting one of the terms in the objective function into the constraints. As a result, we obtain new dual formulation for each optimization problem so that we need not compute inverse matrices neither in the training process nor in finding the classifiers. Moreover, the smaller size of the rectangular kernel matrices is used to reduce the computational time. Extensive testing is carried out on a variety of synthetic and real-world imbalanced datasets, and the findings show that the IRUTSVM algorithm outperforms the TSVM, UTSVM, and RUTSVM algorithms in terms of generalization performance.
- MeSH
- algoritmy * MeSH
- support vector machine * MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Interpretable machine learning (ML) for early detection of cancer has the potential to improve risk assessment and early intervention. METHODS: Data from 261 proteins related to inflammation and/or tumor processes in 123 blood samples collected from healthy persons, but of whom a sub-group later developed squamous cell carcinoma of the oral tongue (SCCOT), were analyzed. Samples from people who developed SCCOT within less than 5 years were classified as tumor-to-be and all other samples as tumor-free. The optimal ML algorithm for feature selection was identified and feature importance computed by the SHapley Additive exPlanations (SHAP) method. Five popular ML algorithms (AdaBoost, Artificial neural networks [ANNs], Decision Tree [DT], eXtreme Gradient Boosting [XGBoost], and Support Vector Machine [SVM]) were applied to establish prediction models, and decisions of the optimal models were interpreted by SHAP. RESULTS: Using the 22 selected features, the SVM prediction model showed the best performance (sensitivity = 0.867, specificity = 0.859, balanced accuracy = 0.863, area under the receiver operating characteristic curve [ROC-AUC] = 0.924). SHAP analysis revealed that the 22 features rendered varying person-specific impacts on model decision and the top three contributors to prediction were Interleukin 10 (IL10), TNF Receptor Associated Factor 2 (TRAF2), and Kallikrein Related Peptidase 12 (KLK12). CONCLUSION: Using multidimensional plasma protein analysis and interpretable ML, we outline a systematic approach for early detection of SCCOT before the appearance of clinical signs.
- MeSH
- jazyk MeSH
- krevní proteiny MeSH
- lidé MeSH
- nádory jazyka * diagnóza MeSH
- spinocelulární karcinom * diagnóza MeSH
- strojové učení MeSH
- ubikvitinligasy MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
DNA methylation classifiers ("episignatures") help to determine the pathogenicity of variants of uncertain significance (VUS). However, their sensitivity is limited due to their training on unambiguous cases with strong-effect variants so that the classification of variants with reduced effect size or in mosaic state may fail. Moreover, episignature evaluation of mosaics as a function of their degree of mosaicism has not been developed so far. We improved episignatures with respect to three categories. Applying (i) minimum-redundancy-maximum-relevance feature selection we reduced their length by up to one order of magnitude without loss of accuracy. Performing (ii) repeated re-training of a support vector machine classifier by step-wise inclusion of cases in the training set that reached probability scores larger than 0.5, we increased the sensitivity of the episignature-classifiers by 30%. In the newly diagnosed patients we confirmed the association between DNA methylation aberration and age at onset of KMT2B-deficient dystonia. Moreover, we found evidence for allelic series, including KMT2B-variants with moderate effects and comparatively mild phenotypes such as late-onset focal dystonia. Retrained classifiers also can detect mosaics that previously remained below the 0.5-threshold, as we showed for KMT2D-associated Kabuki syndrome. Conversely, episignature-classifiers are able to revoke erroneous exome calls of mosaicism, as we demonstrated by (iii) comparing presumed mosaic cases with a distribution of artificial in silico-mosaics that represented all the possible variation in degree of mosaicism, variant read sampling and methylation analysis.
- MeSH
- alely MeSH
- fenotyp MeSH
- lidé MeSH
- metylace DNA * MeSH
- mnohočetné abnormality * genetika MeSH
- mozaicismus MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
BACKGROUND: Genomic conditions can be associated with developmental delay, intellectual disability, autism spectrum disorder, and physical and mental health symptoms. They are individually rare and highly variable in presentation, which limits the use of standard clinical guidelines for diagnosis and treatment. A simple screening tool to identify young people with genomic conditions associated with neurodevelopmental disorders (ND-GCs) who could benefit from further support would be of considerable value. We used machine learning approaches to address this question. METHOD: A total of 493 individuals were included: 389 with a ND-GC, mean age = 9.01, 66% male) and 104 siblings without known genomic conditions (controls, mean age = 10.23, 53% male). Primary carers completed assessments of behavioural, neurodevelopmental and psychiatric symptoms and physical health and development. Machine learning techniques (penalised logistic regression, random forests, support vector machines and artificial neural networks) were used to develop classifiers of ND-GC status and identified limited sets of variables that gave the best classification performance. Exploratory graph analysis was used to understand associations within the final variable set. RESULTS: All machine learning methods identified variable sets giving high classification accuracy (AUROC between 0.883 and 0.915). We identified a subset of 30 variables best discriminating between individuals with ND-GCs and controls which formed 5 dimensions: conduct, separation anxiety, situational anxiety, communication and motor development. LIMITATIONS: This study used cross-sectional data from a cohort study which was imbalanced with respect to ND-GC status. Our model requires validation in independent datasets and with longitudinal follow-up data for validation before clinical application. CONCLUSIONS: In this study, we developed models that identified a compact set of psychiatric and physical health measures that differentiate individuals with a ND-GC from controls and highlight higher-order structure within these measures. This work is a step towards developing a screening instrument to identify young people with ND-GCs who might benefit from further specialist assessment.
- MeSH
- dítě MeSH
- genomika MeSH
- kohortové studie MeSH
- lidé MeSH
- mentální retardace * MeSH
- mladiství MeSH
- poruchy autistického spektra * MeSH
- průřezové studie MeSH
- strojové učení MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH