JavaScript NENÍ povolen !

Prosím povolte JavaScript.

* Zobrazit nápovědu

Reset

1. Autor: Kalina, Jan

15 záznamů v Medvik Filtry

Článek

Effective Automatic Method Selection for Nonlinear Regression Modeling

Kalina, Jan
Autor Kalina, Jan The Czech Academy of Sciences, Institute of Computer Science, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic Charles University, Faculty of Mathematics and Physics, Sokolovská 83, 186 75 Prague 8, Czech Republic
Neoral, Aleš
Autor Neoral, Aleš The Czech Academy of Sciences, Institute of Computer Science, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic
Vidnerová, Petra
Autor Vidnerová, Petra The Czech Academy of Sciences, Institute of Computer Science, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic

International journal of neural systems. 2021 ; 31 (10) : 2150020. [pub] 20210329

Int J Neural Syst
ISSN 1793-6462
Medvik
Zdroj

Metalearning, an important part of artificial intelligence, represents a promising approach for the task of automatic selection of appropriate methods or algorithms. This paper is interested in recommending a suitable estimator for nonlinear regression modeling, particularly in recommending either the standard nonlinear least squares estimator or one of such available alternative estimators, which is highly robust with respect to the presence of outliers in the data. The authors hold the opinion that theoretical considerations will never be able to formulate such recommendations for the nonlinear regression context. Instead, metalearning is explored here as an original approach suitable for this task. In this paper, four different approaches for automatic method selection for nonlinear regression are proposed and computations over a training database of 643 real publicly available datasets are performed. Particularly, while the metalearning results may be harmed by the imbalanced number of groups, an effective approach yields much improved results, performing a novel combination of supervised feature selection by random forest and oversampling by synthetic minority oversampling technique (SMOTE). As a by-product, the computations bring arguments in favor of the very recent nonlinear least weighted squares estimator, which turns out to outperform other (and much more renowned) estimators in a quite large percentage of datasets.

Článek online

Dimensionality reduction methods for biomedical data

Lékař a technika. 2018 ; 48 (1) : 29-35.

Lék. tech.
ISSN 0301-5491
Medvik
Zdroj

Článek online

Big data, biostatistics and complexity reduction

Kalina, Jan, 1977-
Autor Autorita ORCID Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic

European Journal for Biomedical Informatics. 2018 ; 14 (2) : 24-32.

Eur. J. Biomed. Inform. (Praha)
ISSN 1801-5603
Medvik
Zdroj

The aim of this paper is to overview challenges and principles of Big Data analysis in biomedicine. Recent multivariate statistical approaches to complexity reduction represent a useful (and often irreplaceable) methodology allowing performing a reliable Big Data analysis. Attention is paid to principal component analysis, partial least squares, and variable selection based on maximizing conditional entropy. Some important problems as well as ideas of complexity reduction are illustrated on examples from biomedical research tasks. These include high-dimensional data in the form of facial images or gene expression measurements from a cardiovascular genetic study.

MeSH
analýza dat MeSH
analýza hlavních komponent metody MeSH
big data * MeSH
biostatistika * metody MeSH
kardiovaskulární nemoci genetika prevence a kontrola MeSH
lidé MeSH
metoda nejmenších čtverců MeSH
riziko MeSH
rozpoznání obličeje MeSH
systémy pro podporu klinického rozhodování MeSH
Check Tag
lidé MeSH
Publikační typ
práce podpořená grantem MeSH

Článek ve sborníku

Výběr relevantních pravidel pro podporu klinického rozhodování
[Selection of relevant rules within clinical decision support]

Medsoft .... 2017 ; () : 44-49.

ISSN 1803-8115
Medvik
Zdroj

Systémy pro podporu klinického rozhodování jsou důležitými telemedicínskými nástroji se schopností pomáhat lékařům při procesu rozhodování při stanovení diagnózy, terapie či prognózy pacientů. Navrhli a implementovali jsme prototyp systému pro podporu diagnostického rozhodování, který má podobu internetové klasifikační služby. Specifikem tohoto systému je sofistikovaná statistická komponenta, která umožňuje pracovat i s velkým počtem příznaků. Optimalizuje totiž výběr těch příznaků, které jsou nejdůležitější pro určení diagnózy. Její chování jsme ověřili při analýze dat genových expresí z kardiovaskulární genetické studie. Článek diskutuje principy mnohorozměrného statistického uvažování a ukazuje obtíže analýzy vysoce dimenzionálních dat, kdy počet pozorovaných proměnných (příznaků) převyšuje počet pozorování (pacientů).

Clinical decision support systems represent important telemedicine tools with the ability to help physicians within the decision process leading to determining diagnosis, therapy or prognosis of patients. We proposed and implemented a prototype of a clinical decision support systém, which has the form of an internet classification service. A specific property of this system is a sophisticated statistical component, which allows to handle also a large number of symptoms and signs. It namely optimizes the selection of such symptoms and signs which are the most relevant for determining the diagnosis. The performance of the prototype was verified on an analysis of gene expression data from a cardiovascular genetic study. The paper discusses principles of multivariate statistical thinking and reveals challenges of analyzing high-dimensional data with the number of observed variables (symptoms and signs) largely exceeding the number of observations (patients).

Klíčová slova
mnohorozměrná statistika,
MeSH
algoritmy MeSH
diagnóza počítačová MeSH
navrhování softwaru MeSH
statistika jako téma * MeSH
systémy pro podporu klinického rozhodování * MeSH
umělá inteligence MeSH
Publikační typ
práce podpořená grantem MeSH

Článek online

Parametric vs. nonparametric Regression modelling within clinical decision support

International journal on biomedicine and healthcare. 2017 ; 5 (1) : 21-27.

ISSN 1805-8698
Medvik
Zdroj

Decision support systems represent very complicated systems offering assistance with the decision making process. Learning the classification rule of a decision support system requires to solve complex statistical task, most commonly by means of classification analysis. However, the regression methodology may be useful in this context as well. This paper has the aim to overview various regression methods, discuss their properties and show examples within clinical decision making.

Článek online

Analýza dat: výzvy a specifika v neurovědách a psychiatrii
[Data analysis: challenges and specifics in neuroscience and psychiatry]

Časopis lékařů českých. 2017 ; 156 (8) : 430-436.

ISSN 0008-7335
Medvik
Zdroj

Množství dostupných dat, která jsou relevantní pro podporu klinického rozhodování, roste mnohem rychleji, než naše schopnost je analyzovat a interpretovat. Proto dosud není plně využit potenciál dat přispět ke stanovení správné diagnózy, terapie a prognózy jednotlivého pacienta. Měřená data mohou zajistit konkrétní přínos pro konkrétního pacienta, což však platí jen v případě, že jejich biostatistická analýza je provedena spolehlivě a pečlivě. To vyžaduje řešit výzvy, které se mohou jevit nesrozumitelnými pro nestatistiky. Cílem tohoto článku je diskutovat principy statistické analýzy velkých dat ve výzkumu i rutinních aplikacích v klinické medicíně, se zvláštním zřetelem na specifické aspekty psychiatrie. Biostatistická analýza dat ve speciálním oboru vyžaduje své specifické přístupy a odlišné zkušenosti oproti jiným klinickým oblastem, jak dokládají komplikace při analýze psychiatrických dat. Analýza velkých dat v psychiatrickém výzkumu i rutinních aplikacích je velmi vzdálena pouhé servisní činnosti využívající standardní metody mnohorozměrné statistiky a/nebo strojového učení.

The amount of available data relevant for clinical decision support is rising not only rapidly but at the same time much faster than our ability to analyze and interpret them. Thus, the potential of the data to contribute to determining the diagnosis, therapy and prognosis of an individual patient is not appropriately exploited. The hopes to obtain benefit from the data for an individual patient must be accompanied by a reliable and diligent biostatistical analysis which faces serious challenges not always clear to non-statisticians. The aim of this paper is to discuss principles of statistical analysis of big data in research and routine applications in clinical medicine, focusing on particular aspects of psychiatry. The paper brings arguments in favor of the idea that the biostatistical analysis of data in a specialty field requires different approaches and different experience compared to other clinical fields. This is illustrated by a description of common complications of the analysis of psychiatric data. Challenges of the analysis of big data in both psychiatric research and routine practice are explained, which are far from a routine service activity exploiting standard methods of multivariate statistics and/or machine learning. Important research questions, which are important in the current psychiatric research, are presented and discussed from the biostatistical point of view.

MeSH
big data MeSH
biostatistika MeSH
lidé MeSH
neurovědy MeSH
psychiatrie * statistika a číselné údaje MeSH
sběr dat MeSH
statistika jako téma MeSH
systémy pro podporu klinického rozhodování * MeSH
Check Tag
lidé MeSH
Publikační typ
práce podpořená grantem MeSH
přehledy MeSH

Článek online

Clinical decision support: statistical hopes and challenges

International journal on biomedicine and healthcare. 2016 ; 4 (1) : 30-35.

ISSN 1805-8698
Medvik
Zdroj

MeSH
data mining metody MeSH
medicína založená na důkazech MeSH
multivariační analýza MeSH
psychiatrie MeSH
systémy pro podporu klinického rozhodování * organizace a řízení MeSH
využití lékařské informatiky MeSH
Publikační typ
práce podpořená grantem MeSH

Článek online

Gregor Mendel's genetic experiments. A statistical analysis after 150 years

Kalina, Jan, 1977-
Autor Autorita ORCID Institute of Computer Science CAS, Pod Vodárenskou věží, 182 07 Praha 8, Czech Republic

European Journal for Biomedical Informatics. 2016 ; 12 (2) : en20-en26.

Eur. J. Biomed. Inform. (Praha)
ISSN 1801-5603
Medvik
Zdroj

Gregor Mendel is generally acknowledged not only as the founder of genetics but also as the author of the first mathematical result in biology. Although his education had been questioned for a long time, he was profoundly educated in botany as well as physics and in those parts of mathematics (combinatorics, probability theory) applied in his later pea plants experiments. Nevertheless, there remain debates in statistical literature about the reasons why are Mendel’s results in such a too good accordance with expected values [22, 28]. The main aim of this paper is to propose new two-stage statistical models, which are in a better accordance with Mendel’s data than a classical model, where the latter considers a fixed sample size. If Mendel realized his experiments following such two-stage algorithm, which cannot be however proven, the results would purify Mendel’s legacy and remove the suspicions that he modified the results. Mendel’s experiments are described from a statistical point of view and his data are shown to be close to randomly generated data from the novel models. Such model is found as the most suitable, which is remarkably simpler according to the model of [28], while the new model yields only slightly weaker results. The paper also discusses Mendel’s legacy from the point of view of biostatistics.

Článek online

Statistical challenges of big data analysis in medicine

Kalina, Jan
Autor Autorita ORCID Institute of Computer Science CAS, Prague, Czech Republic

International journal on biomedicine and healthcare. 2015 ; 3 (1) : 24-27.

ISSN 1805-8698
Medvik
Zdroj

Článek online

A Robust Supervised Variable Selection for Noisy High-Dimensional Data

BioMed research international. 2015 ; 2015 (-) : 320385. [pub] 20150602

Biomed Res Int
ISSN 2314-6141
Medvik
Zdroj

The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.

Kolekce

Publikováno

Filtry

* Zobrazit nápovědu

* Zobrazit nápovědu

Upřesnit dle MeSH