overfitting Dotaz Zobrazit nápovědu
Deep learning has recently been utilized with great success in a large number of diverse application domains, such as visual and face recognition, natural language processing, speech recognition, and handwriting identification. Convolutional neural networks, that belong to the deep learning models, are a subtype of artificial neural networks, which are inspired by the complex structure of the human brain and are often used for image classification tasks. One of the biggest challenges in all deep neural networks is the overfitting issue, which happens when the model performs well on the training data, but fails to make accurate predictions for the new data that is fed into the model. Several regularization methods have been introduced to prevent the overfitting problem. In the research presented in this manuscript, the overfitting challenge was tackled by selecting a proper value for the regularization parameter dropout by utilizing a swarm intelligence approach. Notwithstanding that the swarm algorithms have already been successfully applied to this domain, according to the available literature survey, their potential is still not fully investigated. Finding the optimal value of dropout is a challenging and time-consuming task if it is performed manually. Therefore, this research proposes an automated framework based on the hybridized sine cosine algorithm for tackling this major deep learning issue. The first experiment was conducted over four benchmark datasets: MNIST, CIFAR10, Semeion, and UPS, while the second experiment was performed on the brain tumor magnetic resonance imaging classification task. The obtained experimental results are compared to those generated by several similar approaches. The overall experimental results indicate that the proposed method outperforms other state-of-the-art methods included in the comparative analysis in terms of classification error and accuracy.
- MeSH
- algoritmy MeSH
- lidé MeSH
- magnetická rezonanční tomografie MeSH
- nádory mozku * MeSH
- neuronové sítě * MeSH
- psaní rukou MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Klíčová slova
- funkce hazardu, FLIPI, Follicular Lymphoma International Prognostic Index, overfitting,
- MeSH
- algoritmy MeSH
- analýza přežití MeSH
- Bayesova věta MeSH
- faktory vyvracející (epidemiologie) MeSH
- folikulární lymfom * MeSH
- lidé MeSH
- logistické modely MeSH
- metody pro podporu rozhodování MeSH
- neuronové sítě MeSH
- odds ratio MeSH
- pravděpodobnost MeSH
- prognóza * MeSH
- statistické modely MeSH
- statistika jako téma MeSH
- support vector machine MeSH
- Check Tag
- lidé MeSH
Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.
- MeSH
- genové regulační sítě MeSH
- lidé MeSH
- messenger RNA genetika MeSH
- mikro RNA genetika MeSH
- umělá inteligence MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
This study aims to review the current literature on methods of preoperative prediction of pituitary adenoma consistency. Pituitary adenoma consistency may be a limiting factor for successful surgical removal of tumors. Efforts have been made to investigate the possibility of an accurate assessment of the preoperative consistency to allow for safer and more effective surgery planning. We searched major scientific databases and systematically analyzed the results. A total of 54 relevant articles were identified and selected for inclusion. These studies evaluated methods based on either MRI intensity, enhancement, radiomics, MR elastometry, or CT evaluation. The results of these studies varied widely. Most studies used the average intensity of either T2WI or ADC maps. Firm tumors appeared hyperintense on T2WI, although only 55% of the studies reported statistically significant results. There are mixed reports on ADC values in firm tumors with findings of increased values (28%), decreased values (22%), or no correlation (50%). Multiple contrast enhancement-based methods showed good results in distinguishing between soft and firm tumors. There were mixed reports on the utility of MR elastography. Attempts to develop radiomics and machine learning-based models have achieved high accuracy and AUC values; however, they are prone to overfitting and need further validation. Multiple methods of preoperative consistency assessment have been studied. None demonstrated sufficient accuracy and reliability in clinical use. Further efforts are needed to enable reliable surgical planning.
- MeSH
- lidé MeSH
- nádory hypofýzy * diagnostické zobrazování chirurgie MeSH
- reprodukovatelnost výsledků MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
- systematický přehled MeSH
BACKGROUND: Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. METHODS: We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. RESULTS: The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. CONCLUSION: Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.
Determination of RNA structural-dynamic properties is challenging for experimental methods. Thus, atomistic molecular dynamics (MD) simulations represent a helpful technique complementary to experiments. However, contemporary MD methods still suffer from limitations of force fields (ffs), including imbalances in the nonbonded ff terms. We have recently demonstrated that some improvement of state-of-the-art AMBER RNA ff can be achieved by adding a new term for H-bonding called gHBfix, which increases tuning flexibility and reduces risk of side-effects. Still, the first gHBfix version did not fully correct simulations of short RNA tetranucleotides (TNs). TNs are key benchmark systems due to availability of unique NMR data, although giving too much weight on improving TN simulations can easily lead to overfitting to A-form RNA. Here we combine the gHBfix version with another term called tHBfix, which separately treats H-bond interactions formed by terminal nucleotides. This allows to refine simulations of RNA TNs without affecting simulations of other RNAs. The approach is in line with adopted strategy of current RNA ffs, where the terminal nucleotides possess different parameters for terminal atoms than the internal nucleotides. Combination of gHBfix with tHBfix significantly improves the behavior of RNA TNs during well-converged enhanced-sampling simulations using replica exchange with solute tempering. TNs mostly populate canonical A-form like states while spurious intercalated structures are largely suppressed. Still, simulations of r(AAAA) and r(UUUU) TNs show some residual discrepancies with primary NMR data which suggests that future tuning of some other ff terms might be useful. Nevertheless, the tHBfix has a clear potential to improve modeling of key biochemical processes, where interactions of RNA single stranded ends are involved.
- MeSH
- konformace nukleové kyseliny MeSH
- lidé MeSH
- nukleotidy chemie MeSH
- RNA chemie MeSH
- simulace molekulární dynamiky normy MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Despite being standard tools for decision-making, the European Organisation for Research and Treatment of Cancer (EORTC), European Association of Urology (EAU), and Club Urologico Espanol de Tratamiento Oncologico (CUETO) risk groups provide moderate performance in predicting recurrence-free survival (RFS) and progression-free survival (PFS) in non-muscle-invasive bladder cancer (NMIBC). In this retrospective combined-cohort data-mining study, the training group consisted of 3570 patients with de novo diagnosed NMIBC. Predictors included gender, age, T stage, histopathological grading, tumor burden and diameter, EORTC and CUETO scores, and type of intravesical treatment. The models developed were externally validated using an independent cohort of 322 patients. Models were trained using Cox proportional-hazards deep neural networks (deep learning; DeepSurv) with a proprietary grid search of hyperparameters. For patients treated with surgery and bacillus Calmette-Guérin-treated patients, the models achieved a c index of 0.650 (95% confidence interval [CI] 0.649-0.650) for RFS and 0.878 (95% CI 0.873-0.874) for PFS in the training group. In the validation group, the c index was 0.651 (95% CI 0.648-0.654) for RFS and 0.881 (95% CI 0.878-0.885) for PFS. After inclusion of patients treated with mitomycin C, the c index for RFS models was 0.6415 (95% CI 0.6412-0.6417) for the training group and 0.660 (95% CI 0.657-0.664) for the validation group. Models for PFS achieved a c index of 0.885 (95% CI 0.885-0.885) for the training set and 0.876 (95% CI 0.873-0.880) for the validation set. Our tool outperformed standard-of-care risk stratification tools and showed no evidence of overfitting. The application is open source and available at https://biostat.umed.pl/deepNMIBC/. PATIENT SUMMARY: We created and validated a new tool to predict recurrence and progression of early-stage bladder cancer. The application uses advanced artificial intelligence to combine state-of-the-art scales, outperforms these scales for prediction, and is freely available online.
- MeSH
- deep learning * MeSH
- hodnocení rizik MeSH
- invazivní růst nádoru MeSH
- lidé MeSH
- lokální recidiva nádoru patologie MeSH
- nádory močového měchýře * patologie MeSH
- prognóza MeSH
- progrese nemoci MeSH
- retrospektivní studie MeSH
- umělá inteligence MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Závěrečná zpráva o řešení grantu Agentury pro zdravotnický výzkum MZ ČR
nestr.
The recent technological advances enabled the biomedical research to explore the underlying biological processes of the living organisms at various resolutions and from different perspectives. While the amount of produced data grew dramatically over the years, the pace at which our knowledge lagged behind - an indication of inability of the current computational tools to extract knowledge from the large pool of noisy data. NEUROMINER will provide a framework for machine learning and data mining, with a special emphasis on neuroscience research. The project has three main axes of research, each corresponding to a currently unmet need: (1) extraction and selection of features with strong discrimination properties, (2) systems able to learn from high-dimensional data and not suffering from overfitting problems, and (3) rigorous statistical model assessment procedure. The applicants are experts in medical image processing and analysis, biostatistics and machine learning.
Nedávné technologické pokroky biomedicínského výzkumu umožnily zkoumat základní biologické procesy v živých organismech při různých rozlišeních a z různých úhlů pohledu. Zatímco množství produkovaných dat v průběhu let dramaticky roste, tempo našich získávaných znalostí spíše zaostává, což ukazuje na neschopnost současných výpočetních nástrojů umožnit extrakci znalostí z velkého množství zašuměných dat. NEUROMINER poskytne rámec pro strojové učení a dolování z obrazových dat se zvláštním důrazem na neurovědní výzkum. Tři hlavní osy projektu odpovídají problémům, pro které v současné době není známo řešení: (1) extrakce a selekce příznaků se silnou diskriminačních schopností z mnohorozměrných dat, (2) nepřeučené systémy učící se z mnohorozměrných dat (3) rigorózní postup pro statistické validace modelů. Navrhovatelé projektu jsou experty ve zpracování analýze medicínských obrazů, biostatistice a strojovém učení.
- MeSH
- biostatistika MeSH
- data mining MeSH
- mozek diagnostické zobrazování MeSH
- neuronové sítě MeSH
- neurozobrazování MeSH
- počítačové zpracování obrazu MeSH
- reprodukovatelnost výsledků MeSH
- schizofrenie diagnostické zobrazování MeSH
- strojové učení MeSH
- Konspekt
- Patologie. Klinická medicína
- NLK Obory
- neurologie
- radiologie, nukleární medicína a zobrazovací metody
- lékařská informatika
- NLK Publikační typ
- závěrečné zprávy o řešení grantu AZV MZ ČR
Knihovny Keras, TensorFlow -- 4.3.2 Konstrukce příznaků (feature engineering) 104 -- 4.4 Přeučení (overfitting
1. elektronické vydání 1 online zdroj (328 stran)
Strojové učení zaznamenalo v posledních letech pozoruhodný pokrok od téměř nepoužitelného rozpoznávání řeči a obrazu k nadlidské přesnosti. Od programů, které nedokázaly porazit jen trochu zkušenějšího hráče go, jsme dospěli k přemožiteli mistra světa. Za pokrokem ve vývoji učících se programů stojí tzv. hluboké učení - deep learning.