BACKGROUND: Neuromuscular diseases (NMDs) are rare disorders characterized by progressive muscle fibre loss, leading to replacement by fibrotic and fatty tissue, muscle weakness and disability. Early diagnosis is critical for therapeutic decisions, care planning and genetic counselling. Muscle magnetic resonance imaging (MRI) has emerged as a valuable diagnostic tool by identifying characteristic patterns of muscle involvement. However, the increasing complexity of these patterns complicates their interpretation, limiting their clinical utility. Additionally, multi-study data aggregation introduces heterogeneity challenges. This study presents a novel multi-study harmonization pipeline for muscle MRI and an AI-driven diagnostic tool to assist clinicians in identifying disease-specific muscle involvement patterns. METHODS: We developed a preprocessing pipeline to standardize MRI fat content across datasets, minimizing source bias. An ensemble of XGBoost models was trained to classify patients based on intramuscular fat replacement, age at MRI and sex. The SHapley Additive exPlanations (SHAP) framework was adapted to analyse model predictions and identify disease-specific muscle involvement patterns. To address class imbalance, training and evaluation were conducted using class-balanced metrics. The model's performance was compared against four expert clinicians using 14 previously unseen MRI scans. RESULTS: Using our harmonization approach, we curated a dataset of 2961 MRI samples from genetically confirmed cases of 20 paediatric and adult NMDs. The model achieved a balanced accuracy of 64.8% ± 3.4%, with a weighted top-3 accuracy of 84.7% ± 1.8% and top-5 accuracy of 90.2% ± 2.4%. It also identified key features relevant for differential diagnosis, aiding clinical decision-making. Compared to four expert clinicians, the model obtained the highest top-3 accuracy (75.0% ± 4.8%). The diagnostic tool has been implemented as a free web platform, providing global access to the medical community. CONCLUSIONS: The application of AI in muscle MRI for NMD diagnosis remains underexplored due to data scarcity. This study introduces a framework for dataset harmonization, enabling advanced computational techniques. Our findings demonstrate the potential of AI-based approaches to enhance differential diagnosis by identifying disease-specific muscle involvement patterns. The developed tool surpasses expert performance in diagnostic ranking and is accessible to clinicians worldwide via the Myo-Guide online platform.
- MeSH
- Adult MeSH
- Internet MeSH
- Middle Aged MeSH
- Humans MeSH
- Magnetic Resonance Imaging * methods MeSH
- Neuromuscular Diseases * diagnosis diagnostic imaging MeSH
- Machine Learning * MeSH
- Check Tag
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
PURPOSE: Chronic obstructive pulmonary disease (COPD) is a prevalent and preventable condition that typically worsens over time. Acute exacerbations of COPD significantly impact disease progression, underscoring the importance of prevention efforts. This observational study aimed to achieve two main objectives: (1) identify patients at risk of exacerbations using an ensemble of clustering algorithms, and (2) classify patients into distinct clusters based on disease severity. METHODS: Data from portable medical devices were analyzed post-hoc using hyperparameter optimization with Self-Organizing Maps (SOM), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Isolation Forest, and Support Vector Machine (SVM) algorithms, to detect flare-ups. Principal Component Analysis (PCA) followed by KMeans clustering was applied to categorize patients by severity. RESULTS: 25 patients were included within the study population, data from 17 patients had the required reliability. Five patients were identified in the highest deterioration group, with one clinically confirmed exacerbation accurately detected by our ensemble algorithm. Then, PCA and KMeans clustering grouped patients into three clusters based on severity: Cluster 0 started with the least severe characteristics but experienced decline, Cluster 1 consistently showed the most severe characteristics, and Cluster 2 showed slight improvement. CONCLUSION: Our approach effectively identified patients at risk of exacerbations and classified them by disease severity. Although promising, the approach would need to be verified on a larger sample with a larger number of recorded clinically verified exacerbations.
- Publication type
- Journal Article MeSH
PURPOSE: The aims of this work are (1) to explore deep learning (DL) architectures, spectroscopic input types, and learning designs toward optimal quantification in MR spectroscopy of simulated pathological spectra; and (2) to demonstrate accuracy and precision of DL predictions in view of inherent bias toward the training distribution. METHODS: Simulated 1D spectra and 2D spectrograms that mimic an extensive range of pathological in vivo conditions are used to train and test 24 different DL architectures. Active learning through altered training and testing data distributions is probed to optimize quantification performance. Ensembles of networks are explored to improve DL robustness and reduce the variance of estimates. A set of scores compares performances of DL predictions and traditional model fitting (MF). RESULTS: Ensembles of heterogeneous networks that combine 1D frequency-domain and 2D time-frequency domain spectrograms as input perform best. Dataset augmentation with active learning can improve performance, but gains are limited. MF is more accurate, although DL appears to be more precise at low SNR. However, this overall improved precision originates from a strong bias for cases with high uncertainty toward the dataset the network has been trained with, tending toward its average value. CONCLUSION: MF mostly performs better compared to the faster DL approach. Potential intrinsic biases on training sets are dangerous in a clinical context that requires the algorithm to be unbiased to outliers (i.e., pathological data). Active learning and ensemble of networks are good strategies to improve prediction performances. However, data quality (sufficient SNR) has proven as a bottleneck for adequate unbiased performance-like in the case of MF.
- MeSH
- Algorithms MeSH
- Deep Learning * MeSH
- Bias MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
OBJECTIVE: The aim of this work was to assemble a large annotated dataset of bitewing radiographs and to use convolutional neural networks to automate the detection of dental caries in bitewing radiographs with human-level performance. MATERIALS AND METHODS: A dataset of 3989 bitewing radiographs was created, and 7257 carious lesions were annotated using minimal bounding boxes. The dataset was then divided into 3 parts for the training (70%), validation (15%), and testing (15%) of multiple object detection convolutional neural networks (CNN). The tested CNN architectures included YOLOv5, Faster R-CNN, RetinaNet, and EfficientDet. To further improve the detection performance, model ensembling was used, and nested predictions were removed during post-processing. The models were compared in terms of the [Formula: see text] score and average precision (AP) with various thresholds of the intersection over union (IoU). RESULTS: The twelve tested architectures had [Formula: see text] scores of 0.72-0.76. Their performance was improved by ensembling which increased the [Formula: see text] score to 0.79-0.80. The best-performing ensemble detected caries with the precision of 0.83, recall of 0.77, [Formula: see text], and AP of 0.86 at IoU=0.5. Small carious lesions were predicted with slightly lower accuracy (AP 0.82) than medium or large lesions (AP 0.88). CONCLUSIONS: The trained ensemble of object detection CNNs detected caries with satisfactory accuracy and performed at least as well as experienced dentists (see companion paper, Part II). The performance on small lesions was likely limited by inconsistencies in the training dataset. CLINICAL SIGNIFICANCE: Caries can be automatically detected using convolutional neural networks. However, detecting incipient carious lesions remains challenging.
- MeSH
- Deep Learning * MeSH
- Humans MeSH
- Dental Caries Susceptibility MeSH
- Neural Networks, Computer MeSH
- Dental Caries * diagnostic imaging MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Presentation of visual stimuli can induce changes in EEG signals that are typically detectable by averaging together data from multiple trials for individual participant analysis as well as for groups or conditions analysis of multiple participants. This study proposes a new method based on the discrete wavelet transform with Huffman coding and machine learning for single-trial analysis of evenal (ERPs) and classification of different visual events in the visual object detection task. METHODS: EEG single trials are decomposed with discrete wavelet transform (DWT) up to the [Formula: see text] level of decomposition using a biorthogonal B-spline wavelet. The coefficients of DWT in each trial are thresholded to discard sparse wavelet coefficients, while the quality of the signal is well maintained. The remaining optimum coefficients in each trial are encoded into bitstreams using Huffman coding, and the codewords are represented as a feature of the ERP signal. The performance of this method is tested with real visual ERPs of sixty-eight subjects. RESULTS: The proposed method significantly discards the spontaneous EEG activity, extracts the single-trial visual ERPs, represents the ERP waveform into a compact bitstream as a feature, and achieves promising results in classifying the visual objects with classification performance metrics: accuracies 93.60[Formula: see text], sensitivities 93.55[Formula: see text], specificities 94.85[Formula: see text], precisions 92.50[Formula: see text], and area under the curve (AUC) 0.93[Formula: see text] using SVM and k-NN machine learning classifiers. CONCLUSION: The proposed method suggests that the joint use of discrete wavelet transform (DWT) with Huffman coding has the potential to efficiently extract ERPs from background EEG for studying evoked responses in single-trial ERPs and classifying visual stimuli. The proposed approach has O(N) time complexity and could be implemented in real-time systems, such as the brain-computer interface (BCI), where fast detection of mental events is desired to smoothly operate a machine with minds.
- MeSH
- Algorithms MeSH
- Electroencephalography * methods MeSH
- Evoked Potentials physiology MeSH
- Humans MeSH
- Area Under Curve MeSH
- Signal Processing, Computer-Assisted MeSH
- Machine Learning MeSH
- Wavelet Analysis * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
This work extends the multi-scale computational scheme for the quantum mechanics (QM) calculations of Nuclear Magnetic Resonance (NMR) chemical shifts (CSs) in proteins that lack a well-defined 3D structure. The scheme couples the sampling of an intrinsically disordered protein (IDP) by classical molecular dynamics (MD) with protein fragmentation using the adjustable density matrix assembler (ADMA) and density functional theory (DFT) calculations. In contrast to our early investigation on IDPs (Pavlíková Přecechtělová et al., J. Chem. Theory Comput., 2019, 15, 5642-5658) and the state-of-the art NMR calculations for structured proteins, a partial re-optimization was implemented on the raw MD geometries in vibrational normal mode coordinates to enhance the accuracy of the MD/ADMA/DFT computational scheme. In addition, machine-learning based cluster analysis was performed on the scheme to explore its potential in producing protein structure ensembles (CLUSTER ensembles) that yield accurate CSs at a reduced computational cost. The performance of the cluster-based calculations is validated against results obtained with conventional structural ensembles consisting of MD snapshots extracted from the MD trajectory at regular time intervals (REGULAR ensembles). CS calculations performed with the refined MD/ADMA/DFT framework employed the 6-311++G(d,p) basis set that outperformed IGLO-III calculations with the same density functional approximation (B3LYP) and both explicit and implicit solvation. The partial geometry optimization did not universally improve the agreement of computed CSs with the experiment but substantially decreased errors associated with the ensemble averaging. A CLUSTER ensemble with 50 structures yielded ensemble averages close to those obtained with a REGULAR ensemble consisting of 500 MD frames. The cluster based calculations thus required only a fraction of the computational time.
The current population worldwide extensively uses social media to share thoughts, societal issues, and personal concerns. Social media can be viewed as an intelligent platform that can be augmented with a capability to analyze and predict various issues such as business needs, environmental needs, election trends (polls), governmental needs, etc. This has motivated us to initiate a comprehensive search of the COVID-19 pandemic-related views and opinions amongst the population on Twitter. The basic training data have been collected from Twitter posts. On this basis, we have developed research involving ensemble deep learning techniques to reach a better prediction of the future evolutions of views in Twitter when compared to previous works that do the same. First, feature extraction is performed through an N-gram stacked autoencoder supervised learning algorithm. The extracted features are then involved in a classification and prediction involving an ensemble fusion scheme of selected machine learning techniques such as decision tree (DT), support vector machine (SVM), random forest (RF), and K-nearest neighbour (KNN). all individual results are combined/fused for a better prediction by using both mean and mode techniques. Our proposed scheme of an N-gram stacked encoder integrated in an ensemble machine learning scheme outperforms all the other existing competing techniques such unigram autoencoder, bigram autoencoder, etc. Our experimental results have been obtained from a comprehensive evaluation involving a dataset extracted from open-source data available from Twitter that were filtered by using the keywords "covid", "covid19", "coronavirus", "covid-19", "sarscov2", and "covid_19".
- MeSH
- COVID-19 * MeSH
- Humans MeSH
- Pandemics MeSH
- SARS-CoV-2 MeSH
- Social Media * MeSH
- Social Networking MeSH
- Machine Learning MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.
Glioma is the most pernicious cancer of the nervous system, with histological grade influencing the survival of patients. Despite many studies on the multimodal treatment approach, survival time remains brief. In this study, a novel two-stage ensemble of an ensemble-type machine learning-based predictive framework for glioma detection and its histograde classification is proposed. In the proposed framework, five characteristics belonging to 135 subjects were considered: human telomerase reverse transcriptase (hTERT), chitinase-like protein (YKL-40), interleukin 6 (IL-6), tissue inhibitor of metalloproteinase-1 (TIMP-1) and neutrophil/lymphocyte ratio (NLR). These characteristics were examined using distinctive ensemble-based machine learning classifiers and combination strategies to develop a computer-aided diagnostic system for the non-invasive prediction of glioma cases and their grade. In the first stage, the analysis was conducted to classify glioma cases and control subjects. Machine learning approaches were applied in the second stage to classify the recognised glioma cases into three grades, from grade II, which has a good prognosis, to grade IV, which is also known as glioblastoma. All experiments were evaluated with a five-fold cross-validation method, and the classification results were analysed using different statistical parameters. The proposed approach obtained a high value of accuracy and other statistical parameters compared with other state-of-the-art machine learning classifiers. Therefore, the proposed framework can be utilised for designing other intervention strategies for the prediction of glioma cases and their grades.
- MeSH
- Glioma * diagnosis MeSH
- Humans MeSH
- Magnetic Resonance Imaging MeSH
- Brain Neoplasms * diagnosis MeSH
- Machine Learning * MeSH
- Neoplasm Grading MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Chronic lymphocytic leukemia (CLL) is the most common form of adult leukemia in the Western world with a highly variable clinical course. Its striking genetic heterogeneity is not yet fully understood. Although the CLL genetic landscape has been well-described, patient stratification based on mutation profiles remains elusive mainly due to the heterogeneity of data. Here we attempted to decrease the heterogeneity of somatic mutation data by mapping mutated genes in the respective biological processes. From the sequencing data gathered by the International Cancer Genome Consortium for 506 CLL patients, we generated pathway mutation scores, applied ensemble clustering on them, and extracted abnormal molecular pathways with a machine learning approach. We identified four clusters differing in pathway mutational profiles and time to first treatment. Interestingly, common CLL drivers such as ATM or TP53 were associated with particular subtypes, while others like NOTCH1 or SF3B1 were not. This study provides an important step in understanding mutational patterns in CLL.
- Publication type
- Journal Article MeSH