classifier learning
Dotaz
Zobrazit nápovědu
The scarcity of high-quality annotations in many application scenarios has recently led to an increasing interest in devising learning techniques that combine unlabeled data with labeled data in a network. In this work, we focus on the label propagation problem in multilayer networks. Our approach is inspired by the heat diffusion model, which shows usefulness in machine learning problems such as classification and dimensionality reduction. We propose a novel boundary-based heat diffusion algorithm that guarantees a closed-form solution with an efficient implementation. We experimentally validated our method on synthetic networks and five real-world multilayer network datasets representing scientific coauthorship, spreading drug adoption among physicians, two bibliographic networks, and a movie network. The results demonstrate the benefits of the proposed algorithm, where our boundary-based heat diffusion dominates the performance of the state-of-the-art methods.
- MeSH
- algoritmy MeSH
- řízené strojové učení * MeSH
- strojové učení MeSH
- vysoká teplota * MeSH
- Publikační typ
- časopisecké články MeSH
Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
To identify patterns in big medical datasets and use Deep Learning and Machine Learning (ML) to reliably diagnose Cardio Vascular Disease (CVD), researchers are currently delving deeply into these fields. Training on large datasets and producing highly accurate validation results is exceedingly difficult. Furthermore, early and precise diagnosis is necessary due to the increased global prevalence of cardiovascular disease (CVD). However, the increasing complexity of healthcare datasets makes it challenging to detect feature connections and produce precise predictions. To address these issues, the Intelligent Cardiovascular Disease Diagnosis based on Ant Colony Optimisation with Enhanced Deep Learning (ICVD-ACOEDL) model was developed. This model employs feature selection (FS) and hyperparameter optimization to diagnose CVD. Applying a min-max scaler, medical data is first consistently prepared. The key feature that sets ICVD-ACOEDL apart is the use of Ant Colony Optimisation (ACO) to select an optimal feature subset, which in turn helps to upgrade the performance of the ensuring deep learning enhanced neural network (DLENN) classifier. The model reforms the hyperparameters of DLENN for CVD classification using Bayesian optimization. Comprehensive evaluations on benchmark medical datasets show that ICVD-ACOEDL exceeds existing techniques, indicating that it could have a significant impact on CVD diagnosis. The model furnishes a workable way to increase CVD classification efficiency and accuracy in real-world medical situations by incorporating ACO for feature selection, min-max scaling for data pre-processing, and Bayesian optimization for hyperparameter tweaking.
- MeSH
- Bayesova věta MeSH
- deep learning * MeSH
- diagnóza počítačová metody MeSH
- Formicidae MeSH
- kardiovaskulární nemoci * diagnóza MeSH
- lidé MeSH
- neuronové sítě * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
OBJECTIVE: This paper introduces a fully automated, subject-specific deep-learning convolutional neural network (CNN) system for forecasting seizures using ambulatory intracranial EEG (iEEG). The system was tested on a hand-held device (Mayo Epilepsy Assist Device) in a pseudo-prospective mode using iEEG from four canines with naturally occurring epilepsy. APPROACH: The system was trained and tested on 75 seizures collected over 1608 d utilizing a genetic algorithm to optimize forecasting hyper-parameters (prediction horizon (PH), median filter window length, and probability threshold) for each subject-specific seizure forecasting model. The trained CNN models were deployed on a hand-held tablet computer and tested on testing iEEG datasets from four canines. The results from the iEEG testing datasets were compared with Monte Carlo simulations using a Poisson random predictor with equal time in warning to evaluate seizure forecasting performance. MAIN RESULTS: The results show the CNN models forecasted seizures at rates significantly above chance in all four dogs (p < 0.01, with mean 0.79 sensitivity and 18% time in warning). The deep learning method presented here surpassed the performance of previously reported methods using computationally expensive features with standard machine learning methods like logistic regression and support vector machine classifiers. SIGNIFICANCE: Our findings principally support the feasibility of deploying trained CNN models on a hand-held computational device (Mayo Epilepsy Assist Device) that analyzes streaming iEEG data for real-time seizure forecasting.
Background: Tuberculosis (TB) is a major cause of illness and death in many countries, especially in Asia and Africa. Repeated tests of microscopic examination are needed to be performed for early detection of the disease. Hence there is a need to automate the diagnostic process for improvement in the sensitivity and accuracy of the test. Objective: To automate the decision support system for tuberculosis digital images using histogram based statistical features and evolutionary based extreme learning machines. Materials and methods: The sputum smear positive and negative images recorded under standard image acquisition protocol are subjected to histogram based feature extraction technique. Most significant features are selected using student ‘t’ test. These significant features are further used as input to the differential evolutionary extreme learning machine classifier. Results: Results demonstrate that the histogram based significant features are able to differentiate TB positive and negative images with a higher specificity and accuracy. Conclusion: The methodology used in this work seems to be useful for the automated analysis of TB sputum smear images in mass screening disorders such as pulmonary tuberculosis.
Pancreatic ductal adenocarcinoma (PDAC), the most deadly solid malignancy, is typically detected late and at an inoperable stage. Early or incidental detection is associated with prolonged survival, but screening asymptomatic individuals for PDAC using a single test remains unfeasible due to the low prevalence and potential harms of false positives. Non-contrast computed tomography (CT), routinely performed for clinical indications, offers the potential for large-scale screening, however, identification of PDAC using non-contrast CT has long been considered impossible. Here, we develop a deep learning approach, pancreatic cancer detection with artificial intelligence (PANDA), that can detect and classify pancreatic lesions with high accuracy via non-contrast CT. PANDA is trained on a dataset of 3,208 patients from a single center. PANDA achieves an area under the receiver operating characteristic curve (AUC) of 0.986-0.996 for lesion detection in a multicenter validation involving 6,239 patients across 10 centers, outperforms the mean radiologist performance by 34.1% in sensitivity and 6.3% in specificity for PDAC identification, and achieves a sensitivity of 92.9% and specificity of 99.9% for lesion detection in a real-world multi-scenario validation consisting of 20,530 consecutive patients. Notably, PANDA utilized with non-contrast CT shows non-inferiority to radiology reports (using contrast-enhanced CT) in the differentiation of common pancreatic lesion subtypes. PANDA could potentially serve as a new tool for large-scale pancreatic cancer screening.
- MeSH
- deep learning * MeSH
- duktální karcinom slinivky břišní * diagnostické zobrazování patologie MeSH
- lidé MeSH
- nádory slinivky břišní * diagnostické zobrazování patologie MeSH
- pankreas diagnostické zobrazování patologie MeSH
- počítačová rentgenová tomografie MeSH
- retrospektivní studie MeSH
- umělá inteligence MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- multicentrická studie MeSH
Fragmented QRS (fQRS) is an electrocardiographic (ECG) marker of myocardial conduction abnormality, characterized by additional notches in the QRS complex. The presence of fQRS has been associated with an increased risk of all-cause mortality and arrhythmia in patients with cardiovascular disease. However, current binary visual analysis is prone to intra- and inter-observer variability and different definitions are problematic in clinical practice. Therefore, objective quantification of fQRS is needed and could further improve risk stratification of these patients. We present an automated method for fQRS detection and quantification. First, a novel robust QRS complex segmentation strategy is proposed, which combines multi-lead information and excludes abnormal heartbeats automatically. Afterwards extracted features, based on variational mode decomposition (VMD), phase-rectified signal averaging (PRSA) and the number of baseline-crossings of the ECG, were used to train a machine learning classifier (Support Vector Machine) to discriminate fragmented from non-fragmented ECG-traces using multi-center data and combining different fQRS criteria used in clinical settings. The best model was trained on the combination of two independent previously annotated datasets and, compared to these visual fQRS annotations, achieved Kappa scores of 0.68 and 0.44, respectively. We also show that the algorithm might be used in both regular sinus rhythm and irregular beats during atrial fibrillation. These results demonstrate that the proposed approach could be relevant for clinical practice by objectively assessing and quantifying fQRS. The study sets the path for further clinical application of the developed automated fQRS algorithm.
- MeSH
- algoritmy MeSH
- elektrokardiografie * metody MeSH
- fibrilace síní * diagnóza MeSH
- lidé MeSH
- strojové učení MeSH
- support vector machine MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Cervical cancer is still one of the most prevalent cancers in women and a significant cause of mortality. Cytokine gene variants and socio-demographic characteristics have been reported as biomarkers for determining the cervical cancer risk in the Indian population. This study was designed to apply a machine learning-based model using these risk factors for better prognosis and prediction of cervical cancer. This study includes the dataset of cytokine gene variants, clinical and socio-demographic characteristics of normal healthy control subjects, and cervical cancer cases. Different risk factors, including demographic details and cytokine gene variants, were analysed using different machine learning approaches. Various statistical parameters were used for evaluating the proposed method. After multi-step data processing and random splitting of the dataset, machine learning methods were applied and evaluated with 5-fold cross-validation and also tested on the unseen data records of a collected dataset for proper evaluation and analysis. The proposed approaches were verified after analysing various performance metrics. The logistic regression technique achieved the highest average accuracy of 82.25% and the highest average F1-score of 82.58% among all the methods. Ridge classifiers and the Gaussian Naïve Bayes classifier achieved the highest sensitivity-85%. The ridge classifier surpasses most of the machine learning classifiers with 84.78% accuracy and 97.83% sensitivity. The risk factors analysed in this study can be taken as biomarkers in developing a cervical cancer diagnosis system. The outcomes demonstrate that the machine learning assisted analysis of cytokine gene variants and socio-demographic characteristics can be utilised effectively for predicting the risk of developing cervical cancer.
- MeSH
- Bayesova věta MeSH
- cytokiny genetika MeSH
- demografie MeSH
- lidé MeSH
- nádory děložního čípku * epidemiologie genetika MeSH
- strojové učení MeSH
- Check Tag
- lidé MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Optimization of neural network topology, weights and neuron transfer functions for given data set and problem is not an easy task. In this article, we focus primarily on building optimal feed-forward neural network classifier for i.i.d. data sets. We apply meta-learning principles to the neural network structure and function optimization. We show that diversity promotion, ensembling, self-organization and induction are beneficial for the problem. We combine several different neuron types trained by various optimization algorithms to build a supervised feed-forward neural network called Group of Adaptive Models Evolution (GAME). The approach was tested on a large number of benchmark data sets. The experiments show that the combination of different optimization algorithms in the network is the best choice when the performance is averaged over several real-world problems.
SIGNIFICANCE: Machine learning is increasingly being applied to the classification of microscopic data. In order to detect some complex and dynamic cellular processes, time-resolved live-cell imaging might be necessary. Incorporating the temporal information into the classification process may allow for a better and more specific classification. AIM: We propose a methodology for cell classification based on the time-lapse quantitative phase images (QPIs) gained by digital holographic microscopy (DHM) with the goal of increasing performance of classification of dynamic cellular processes. APPROACH: The methodology was demonstrated by studying epithelial-mesenchymal transition (EMT) which entails major and distinct time-dependent morphological changes. The time-lapse QPIs of EMT were obtained over a 48-h period and specific novel features representing the dynamic cell behavior were extracted. The two distinct end-state phenotypes were classified by several supervised machine learning algorithms and the results were compared with the classification performed on single-time-point images. RESULTS: In comparison to the single-time-point approach, our data suggest the incorporation of temporal information into the classification of cell phenotypes during EMT improves performance by nearly 9% in terms of accuracy, and further indicate the potential of DHM to monitor cellular morphological changes. CONCLUSIONS: Proposed approach based on the time-lapse images gained by DHM could improve the monitoring of live cell behavior in an automated fashion and could be further developed into a tool for high-throughput automated analysis of unique cell behavior.