Flow cytometry immunophenotyping is critical for the diagnostic classification of mature/peripheral B-cell neoplasms/B-cell chronic lymphoproliferative disorders (B-CLPD). Quantitative driven classification approaches applied to multiparameter flow cytometry immunophenotypic data can be used to extract maximum information from a multidimensional space created by individual parameters (e.g., immunophenotypic markers), for highly accurate and automated classification of individual patient (sample) data. Here, we developed and compared five diagnostic classification algorithms, based on a large set of EuroFlow multicentric flow cytometry data files from a cohort 659 B-CLPD patients. These included automatic population separators based on Principal Component Analysis (PCA), Canonical Variate Analysis (CVA), Neighbourhood Component Analysis (NCA), Support Vector Machine algorithms (SVM) and a variant of the CA(Canonical Analysis) algorithm, in which the number of SDs (Standard Deviations) varied for each of the comparisons of different pairs of diseases (CA-vSD). All five classification approaches are based on direct prospective interrogation of individual B-CLPD patients against the EuroFlow flow cytometry B-CLPD database composed of tumor B-cells of 659 individual patients stained in an identical way and classified a priori by the World Health Organization (WHO) criteria into nine diagnostic categories. Each classification approach was evaluated in parallel in terms of accuracy (% properly classified cases), precision (multiple or single diagnosis/case) and coverage (% cases with a proposed diagnosis). Overall, average rates of correct diagnosis (for the nine B-CLPD diagnostic entities) of between 58.9 % and 90.6 % were obtained with the five algorithms, with variable percentages of cases being either misclassified (4.1 %-14.0 %) or unclassifiable (0.3 %-37.0 %). Automatic population separators based on CA, SVM and PCA showed a high average level of correctness (90.6 %, 86.8 %, and 86.0 %, respectively). Nevertheless, this was at the expense of proposing a considerable number of multiple diagnoses for a significant proportion of the test cases (54.5 %, 53.5 %, and 49.6 %, respectively). The CA-vSD algorithm generated the smaller average misclassification rate (4.1 %), but with 37.0 % of cases for which no diagnosis was proposed. In contrast, the NCA algorithm left only 2.7 % of cases without an associated diagnosis but misclassified 14.0 %. Among correctly classified cases (83.3 % of total), 91.2 % had a single proposed diagnosis, 8.6 % had two possible diagnoses, and 0.2 % had three. We demonstrate that the proposed AI algorithms provide an acceptable level of accuracy for the diagnostic classification of B-CLPD patients and, in general, surpass other algorithms reported in the literature.
- MeSH
- Algorithms MeSH
- B-Lymphocytes * pathology MeSH
- Immunophenotyping * methods MeSH
- Middle Aged MeSH
- Humans MeSH
- Lymphoproliferative Disorders * diagnosis classification MeSH
- Flow Cytometry * methods MeSH
- Aged MeSH
- Support Vector Machine MeSH
- Check Tag
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Comparative Study MeSH
Long QT syndrome (LQTS) presents a group of inheritable channelopathies with prolonged ventricular repolarization, leading to syncope, ventricular tachycardia, and sudden death. Differentiating LQTS genotypes is crucial for targeted management and treatment, yet conventional genetic testing remains costly and time-consuming. This study aims to improve the distinction between LQTS genotypes, particularly LQT3, through a novel electrocardiogram (ECG)-based approach. Patients with LQT3 are at elevated risk due to arrhythmia triggers associated with rest and sleep. Employing a database of genotyped long QT syndrome E-HOL-03-0480-013 ECG signals, we introduced two innovative parameterization techniques-area under the ECG curve and wave transformation into the unit circle-to classify LQT3 against LQT1 and LQT2 genotypes. Our methodology utilized single-lead ECG data with a 200 Hz sampling frequency. The support vector machine (SVM) model demonstrated the ability to discriminate LQT3 with a recall of 90% and a precision of 81%, achieving an F1-score of 0.85. This parameterization offers a potential substitute for genetic testing and is practical for low frequencies. These single-lead ECG data could enhance smartwatches' functionality and similar cardiovascular monitoring applications. The results underscore the viability of ECG morphology-based genotype classification, promising a significant step towards streamlined diagnosis and improved patient care in LQTS.
- MeSH
- Adult MeSH
- Electrocardiography * methods MeSH
- Genotype MeSH
- Humans MeSH
- Machine Learning * MeSH
- Support Vector Machine MeSH
- Long QT Syndrome * genetics diagnosis physiopathology MeSH
- Check Tag
- Adult MeSH
- Humans MeSH
- Male MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Comparative Study MeSH
Early detection of malignant thyroid nodules is crucial for effective treatment, but traditional diagnostic methods face challenges such as variability in expert opinions and limited integration of advanced imaging techniques. This prospective cohort study investigates a novel multimodal approach, integrating traditional methods with advanced machine learning techniques. We studied 181 patients who underwent fine-needle aspiration (FNA) biopsy, each contributing one nodule, resulting in a total of 181 nodules for our analysis. Data collection included sex, age, and ultrasound imaging, which incorporated elastography. Features extracted from these images included Thyroid Imaging Reporting and Data System (TIRADS) scores, elastography parameters, and radiomic features. The pathological results based on the FNA biopsy, provided by the pathologists, served as our gold standard for nodule classification. Our methodology, termed ELTIRADS, combines these features with interpretable machine learning techniques. Performance evaluation showed that a Support Vector Machine (SVM) classifier using TIRADS, elastography data, and radiomic features achieved high accuracy (0.92), with sensitivity (0.89), specificity (0.94), precision (0.89), and F1 score (0.89). To enhance interpretability, we used hierarchical clustering, shapley additive explanations (SHAP), and partial dependence plots (PDP). This combined approach holds promise for enhancing the accuracy of thyroid nodule malignancy detection, thereby contributing to advancements in personalized and precision medicine in the field of thyroid cancer research.
- MeSH
- Adult MeSH
- Elasticity Imaging Techniques * methods MeSH
- Middle Aged MeSH
- Humans MeSH
- Thyroid Neoplasms diagnostic imaging classification pathology diagnosis MeSH
- Prospective Studies MeSH
- Radiomics MeSH
- Aged MeSH
- Thyroid Gland diagnostic imaging pathology MeSH
- Machine Learning * MeSH
- Support Vector Machine MeSH
- Biopsy, Fine-Needle MeSH
- Thyroid Nodule * diagnostic imaging pathology classification MeSH
- Check Tag
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.
- MeSH
- Speech Acoustics MeSH
- Algorithms MeSH
- Humans MeSH
- Linguistics MeSH
- Likelihood Functions MeSH
- Speech MeSH
- Forensic Sciences * methods MeSH
- Support Vector Machine MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Imbalanced datasets are prominent in real-world problems. In such problems, the data samples in one class are significantly higher than in the other classes, even though the other classes might be more important. The standard classification algorithms may classify all the data into the majority class, and this is a significant drawback of most standard learning algorithms, so imbalanced datasets need to be handled carefully. One of the traditional algorithms, twin support vector machines (TSVM), performed well on balanced data classification but poorly on imbalanced datasets classification. In order to improve the TSVM algorithm's classification ability for imbalanced datasets, recently, driven by the universum twin support vector machine (UTSVM), a reduced universum twin support vector machine for class imbalance learning (RUTSVM) was proposed. The dual problem and finding classifiers involve matrix inverse computation, which is one of RUTSVM's key drawbacks. In this paper, we improve the RUTSVM and propose an improved reduced universum twin support vector machine for class imbalance learning (IRUTSVM). We offer alternative Lagrangian functions to tackle the primal problems of RUTSVM in the suggested IRUTSVM approach by inserting one of the terms in the objective function into the constraints. As a result, we obtain new dual formulation for each optimization problem so that we need not compute inverse matrices neither in the training process nor in finding the classifiers. Moreover, the smaller size of the rectangular kernel matrices is used to reduce the computational time. Extensive testing is carried out on a variety of synthetic and real-world imbalanced datasets, and the findings show that the IRUTSVM algorithm outperforms the TSVM, UTSVM, and RUTSVM algorithms in terms of generalization performance.
- MeSH
- Algorithms * MeSH
- Support Vector Machine * MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Atherosclerosis leads to coronary artery disease (CAD) and myocardial infarction (MI), a major cause of morbidity and mortality worldwide. The computer-aided prognosis of atherosclerotic events with the electrocardiogram (ECG) derived heart rate variability (HRV) can be a robust method in the prognosis of atherosclerosis events. METHODS: A total of 70 male subjects aged 55 ± 5 years participated in the study. The lead-II ECG was recorded and sampled at 200 Hz. The tachogram was obtained from the ECG signal and used to extract twenty-five HRV features. The one-way Analysis of variance (ANOVA) test was performed to find the significant differences between the CAD, MI, and control subjects. Features were used in the training and testing of a two-class artificial neural network (ANN) and support vector machine (SVM). RESULTS: The obtained results revealed depressed HRV under atherosclerosis. Accuracy of 100% was obtained in classifying CAD and MI subjects from the controls using ANN. Accuracy was 99.6% with SVM, and in the classification of CAD from MI subjects using SVM and ANN, 99.3% and 99.0% accuracy was obtained respectively. CONCLUSIONS: Depressed HRV has been suggested to be a marker in the identification of atherosclerotic events. The good accuracy observed in classification between control, CAD, and MI subjects, revealed it to be a non-invasive cost-effective approach in the prognosis of atherosclerotic events.
Random Forest is an ensemble of decision trees based on the bagging and random subspace concepts. As suggested by Breiman, the strength of unstable learners and the diversity among them are the ensemble models' core strength. In this paper, we propose two approaches known as oblique and rotation double random forests. In the first approach, we propose rotation based double random forest. In rotation based double random forests, transformation or rotation of the feature space is generated at each node. At each node different random feature subspace is chosen for evaluation, hence the transformation at each node is different. Different transformations result in better diversity among the base learners and hence, better generalization performance. With the double random forest as base learner, the data at each node is transformed via two different transformations namely, principal component analysis and linear discriminant analysis. In the second approach, we propose oblique double random forest. Decision trees in random forest and double random forest are univariate, and this results in the generation of axis parallel split which fails to capture the geometric structure of the data. Also, the standard random forest may not grow sufficiently large decision trees resulting in suboptimal performance. To capture the geometric properties and to grow the decision trees of sufficient depth, we propose oblique double random forest. The oblique double random forest models are multivariate decision trees. At each non-leaf node, multisurface proximal support vector machine generates the optimal plane for better generalization performance. Also, different regularization techniques (Tikhonov regularization, axis-parallel split regularization, Null space regularization) are employed for tackling the small sample size problems in the decision trees of oblique double random forest. The proposed ensembles of decision trees produce trees with bigger size compared to the standard ensembles of decision trees as bagging is used at each non-leaf node which results in improved performance. The evaluation of the baseline models and the proposed oblique and rotation double random forest models is performed on benchmark 121 UCI datasets and real-world fisheries datasets. Both statistical analysis and the experimental results demonstrate the efficacy of the proposed oblique and rotation double random forest models compared to the baseline models on the benchmark datasets.
- MeSH
- Algorithms * MeSH
- Principal Component Analysis MeSH
- Rotation MeSH
- Support Vector Machine * MeSH
- Publication type
- Journal Article MeSH
Fragmented QRS (fQRS) is an electrocardiographic (ECG) marker of myocardial conduction abnormality, characterized by additional notches in the QRS complex. The presence of fQRS has been associated with an increased risk of all-cause mortality and arrhythmia in patients with cardiovascular disease. However, current binary visual analysis is prone to intra- and inter-observer variability and different definitions are problematic in clinical practice. Therefore, objective quantification of fQRS is needed and could further improve risk stratification of these patients. We present an automated method for fQRS detection and quantification. First, a novel robust QRS complex segmentation strategy is proposed, which combines multi-lead information and excludes abnormal heartbeats automatically. Afterwards extracted features, based on variational mode decomposition (VMD), phase-rectified signal averaging (PRSA) and the number of baseline-crossings of the ECG, were used to train a machine learning classifier (Support Vector Machine) to discriminate fragmented from non-fragmented ECG-traces using multi-center data and combining different fQRS criteria used in clinical settings. The best model was trained on the combination of two independent previously annotated datasets and, compared to these visual fQRS annotations, achieved Kappa scores of 0.68 and 0.44, respectively. We also show that the algorithm might be used in both regular sinus rhythm and irregular beats during atrial fibrillation. These results demonstrate that the proposed approach could be relevant for clinical practice by objectively assessing and quantifying fQRS. The study sets the path for further clinical application of the developed automated fQRS algorithm.
- MeSH
- Algorithms MeSH
- Electrocardiography * methods MeSH
- Atrial Fibrillation * diagnosis MeSH
- Humans MeSH
- Machine Learning MeSH
- Support Vector Machine MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The skull, along with the pelvic bone, serves an important source of clues as to the sex of human skeletal remains. The frontal bone is one of the most significant sexually dimorphic structures employed in anthropological research, especially when studied by methods of virtual anthropology. For this reason, many new methods have been developed, but their utility for other populations remains to be verified. In the present study, we tested one such approach-the landmark-free method of Bulut et al. (2016) for quantifying sexually dimorphic differences in the shape of the frontal bone, developed using a sample of the Turkish population. Our study builds upon this methodology and tests its utility for the Czech population. We evaluated the shape of the male and female frontal bone using 3D morphometrics, comparing virtual models of frontal bones and corresponding software-generated spheres. To do so, we calculated the relative size of the frontal bone area deviating from the fitted sphere by less than 1 mm and used these data to estimate the sex of individuals. Using our sample of the Czech population, the method estimated the sex correctly in 72.8% of individuals. This success rate is about 5% lower than that achieved with the Turkish sample. This method is therefore not very suitable for estimating the sex of Czech individuals, especially considering the significantly greater success rates of other approaches.
- MeSH
- Frontal Bone anatomy & histology diagnostic imaging MeSH
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Young Adult MeSH
- Tomography, X-Ray Computed MeSH
- Computer Simulation * MeSH
- Image Processing, Computer-Assisted MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Forensic Anthropology MeSH
- Support Vector Machine MeSH
- Sex Determination by Skeleton methods MeSH
- Imaging, Three-Dimensional * MeSH
- Check Tag
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Young Adult MeSH
- Male MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Publication type
- Journal Article MeSH
- Geographicals
- Czech Republic MeSH
Machine learning classifications of first-episode psychosis (FEP) using neuroimaging have predominantly analyzed brain volumes. Some studies examined cortical thickness, but most of them have used parcellation approaches with data from single sites, which limits claims of generalizability. To address these limitations, we conducted a large-scale, multi-site analysis of cortical thickness comparing parcellations and vertex-wise approaches. By leveraging the multi-site nature of the study, we further investigated how different demographical and site-dependent variables affected predictions. Finally, we assessed relationships between predictions and clinical variables. 428 subjects (147 females, mean age 27.14) with FEP and 448 (230 females, mean age 27.06) healthy controls were enrolled in 8 centers by the ClassiFEP group. All subjects underwent a structural MRI and were clinically assessed. Cortical thickness parcellation (68 areas) and full cortical maps (20,484 vertices) were extracted. Linear Support Vector Machine was used for classification within a repeated nested cross-validation framework. Vertex-wise thickness maps outperformed parcellation-based methods with a balanced accuracy of 66.2% and an Area Under the Curve of 72%. By stratifying our sample for MRI scanner, we increased generalizability across sites. Temporal brain areas resulted as the most influential in the classification. The predictive decision scores significantly correlated with age at onset, duration of treatment, and positive symptoms. In conclusion, although far from the threshold of clinical relevance, temporal cortical thickness proved to classify between FEP subjects and healthy individuals. The assessment of site-dependent variables permitted an increase in the across-site generalizability, thus attempting to address an important machine learning limitation.
- MeSH
- Adult MeSH
- Humans MeSH
- Magnetic Resonance Imaging methods MeSH
- Brain MeSH
- Neuroimaging MeSH
- Psychotic Disorders * diagnostic imaging MeSH
- Support Vector Machine MeSH
- Check Tag
- Adult MeSH
- Humans MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Multicenter Study MeSH
- Research Support, Non-U.S. Gov't MeSH