High-dimensional data Dotaz Zobrazit nápovědu
The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.
We address the problem of entropy estimation for high-dimensional finite-accuracy data. Our main application is evaluating high-order mutual information image similarity criteria for multimodal image registration. The basis of our method is an estimator based on k-th nearest neighbor (NN) distances, modified so that only distances greater than some constant R are evaluated. This modification requires a correction which is found numerically in a preprocessing step using quadratic programming. We compare experimentally our new method with k-NN and histogram estimators on synthetic data as well as for evaluation of mutual information for image similarity.
- MeSH
- algoritmy MeSH
- entropie MeSH
- financování organizované MeSH
- interpretace obrazu počítačem metody MeSH
- magnetická rezonanční tomografie metody MeSH
- mozek anatomie a histologie MeSH
- reprodukovatelnost výsledků MeSH
- rozpoznávání automatizované metody MeSH
- senzitivita a specificita MeSH
- umělá inteligence MeSH
- vylepšení obrazu metody MeSH
BACKGROUND: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. RESULTS: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets.
- MeSH
- algoritmy MeSH
- databáze faktografické MeSH
- lidé MeSH
- nádory slinivky břišní diagnóza genetika MeSH
- počítačová simulace MeSH
- proteomika metody MeSH
- reprodukovatelnost výsledků MeSH
- spektrometrie hmotnostní - ionizace laserem za účasti matrice metody MeSH
- strojové učení MeSH
- studie případů a kontrol MeSH
- teoretické modely MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
A new method for two-dimensional deconvolution of medical ultrasonic images is presented. The spatial resolution of the deconvolved images is much higher compared to the common images of the fundamental and second harmonic. The deconvolution also results in a more distinct speckle pattern. Unlike the most published deconvolution algorithms for ultrasonic images, the presented technique can be implemented using currently available hardware in real-time imaging, with a rate up to 50 frames per second. This makes it attractive for application in the current ultrasound scanners. The algorithm is based on two-dimensional homomorphic deconvolution with simplified assumptions about the point spread function. Broadband radio frequency image data are deconvolved instead of common fundamental harmonic data. Thus, information of both the first and second harmonics is used. The method was validated on image data recorded from a tissue-mimicking phantom and on clinical image data.
- MeSH
- algoritmy MeSH
- fantomy radiodiagnostické MeSH
- financování organizované MeSH
- interpretace obrazu počítačem metody MeSH
- lidé MeSH
- počítačové zpracování signálu MeSH
- reprodukovatelnost výsledků MeSH
- senzitivita a specificita MeSH
- ukládání a vyhledávání informací metody MeSH
- ultrasonografie metody přístrojové vybavení MeSH
- vylepšení obrazu metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- hodnotící studie MeSH
První vysoce přesná 3D kinematická data pánevních končetin zaznamenaná u normálních psů čtyř různých plemen (bígl, francouzský buldoček, malinois, vipet) in vivo pomocí biplanární vysokofrekvenční fluoroskopie v kombinaci s 3D optoelektrickým systémem a následnou analýzou XROMM bez použití markerů (Scientific Rotoscoping, SR nebo 3D-2D registrační proces) odhalují: a) 3D kinematiku pánevních končetin v dosud nevídané míře přesnosti; b) podstatná omezení při použití dat založených na kožních markerech. Očekávali jsme, že kinematika pánevních končetin se bude lišit v závislosti na tvaru těla. Srovnání čtyř plemen odlišuje francouzského buldočka od ostatních, pokud jde o trajektorie ve frontální rovině (abdukce/addukce) a rotaci stehenní kosti v dlouhé ose. Francouzští buldočci převádějí rozsáhlou rotaci dlouhé osy femuru (> 30°) do silného laterálního posunu a rotace kolem kraniokaudální osy (rolování) a distálně-proximální osy pánve (odklon), aby kompenzovali silně abdukované postavení pánevních končetin od začátku stoje. Předpokládáme, že plemena, která vykazují neobvyklou kinematiku, zejména vysokou abdukci femuru, mohou být náchylná k vyššímu dlouhodobému zatížení křížových vazů.
The first high-precision 3D in vivo hindlimb kinematic data to be recorded in normal dogs of four different breeds (Beagle, French bulldog, Malinois, Whippet) using biplanar, high-frequency fluoroscopy combined with a 3D optoelectric system followed by a markerless XROMM analysis (Scientific Rotoscoping, SR or 3D-2D registration process) reveal a) 3D hindlimb kinematics to an unprecedented degree of precision and b) substantial limitations to the use of skin marker -based data. We expected hindlimb kinematics to differ in relation to body shape. But, a comparison of the four breeds sets the French bulldog aside from the others in terms of trajectories in the frontal plane (abduction/adduction) and long axis rotation of the femur. French bulldogs translate extensive femoral long axis rotation (>30°) into a strong lateral displacement and rotations about the craniocaudal (roll) and the distalproximal (yaw) axes of the pelvis in order to compensate for a highly abducted hindlimb position from the beginning of stance. We assume that breeds which exhibit unusual kinematics, especially high femoral abduction, might be susceptible to a higher long-term loading of the cruciate ligaments.
- MeSH
- analýza dat MeSH
- biomechanika * fyziologie MeSH
- končetiny * MeSH
- psi klasifikace MeSH
- zvířata MeSH
- Check Tag
- psi klasifikace MeSH
- zvířata MeSH
- Publikační typ
- abstrakty MeSH
The aim of this paper is to overview challenges and principles of Big Data analysis in biomedicine. Recent multivariate statistical approaches to complexity reduction represent a useful (and often irreplaceable) methodology allowing performing a reliable Big Data analysis. Attention is paid to principal component analysis, partial least squares, and variable selection based on maximizing conditional entropy. Some important problems as well as ideas of complexity reduction are illustrated on examples from biomedical research tasks. These include high-dimensional data in the form of facial images or gene expression measurements from a cardiovascular genetic study.
- MeSH
- analýza dat MeSH
- analýza hlavních komponent metody MeSH
- big data * MeSH
- biostatistika * metody MeSH
- kardiovaskulární nemoci genetika prevence a kontrola MeSH
- lidé MeSH
- metoda nejmenších čtverců MeSH
- riziko MeSH
- rozpoznání obličeje MeSH
- systémy pro podporu klinického rozhodování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- práce podpořená grantem MeSH
Background: Microarray technologies are used to measure the simultaneous expression of a certain set of thousands of genes based on ribonucleic acid (RNA) obtained from a biological sample. We are interested in several statistical analyses such as 1) finding differentially expressed genes between or among several experimental groups, 2) finding a small number of genes allowing for the correct classification of a sample in a certain group, and 3) finding relations among genes. Objectives: Gene expression data are high dimensional, and this fact complicates their analysis because we are able to perform only a few samples (e.g. the peripheral blood from a limited number of patients) for a certain set of thousands of genes. The main purpose of this paper is to present the shrinkage estimator and show its application in different statistical analyses. Methods: The shrinkage approach relates to the shift of a certain value of a classic estimator towards a certain value of a specified target estimator. More precisely, the shrinkage estimator is the weighted average of the classic estimator and the target estimator. Results: The benefit of the shrinkage estimator is that it improves the mean squared error (MSE) as compared to a classic estimator. The MSE combines the measure of an estimator’s bias away from its true unknown value and the measure of the estimator’s variability. The shrinkage estimator is a biased estimator but has a lower variability. Conclusions: The shrinkage estimator can be considered as a promising estimator for analyzing high dimensional gene expression data.
- MeSH
- exprese genu * genetika MeSH
- lidé MeSH
- mikročipová analýza * metody statistika a číselné údaje MeSH
- RNA * genetika MeSH
- statistické modely MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- práce podpořená grantem MeSH
Untargeted metabolomic approaches offer new opportunities for a deeper understanding of the molecular events related to toxic exposure. This study proposes a metabolomic investigation of biochemical alterations occurring in urine as a result of dioxin toxicity. Urine samples were collected from Czech chemical workers submitted to severe dioxin occupational exposure in a herbicide production plant in the late 1960s. Experiments were carried out with ultra-high pressure liquid chromatography (UHPLC) coupled to high-resolution quadrupole time-of-flight (QTOF) mass spectrometry. A chemistry-driven feature selection was applied to focus on steroid-related metabolites. Supervised multivariate data analysis allowed biomarkers, mainly related to bile acids, to be highlighted. These results supported the hypothesis of liver damage and oxidative stress for long-term dioxin toxicity. As a second step of data analysis, the information gained from the urine analysis of Victor Yushchenko after his poisoning was examined. A subset of relevant urinary markers of acute dioxin toxicity from this extreme phenotype, including glucuro- and sulfo-conjugated endogenous steroid metabolites and bile acids, was assessed for its ability to detect long-term effects of exposure. The metabolomic strategy presented in this work allowed the determination of metabolic patterns related to dioxin effects in human and the discovery of highly predictive subsets of biologically meaningful and clinically relevant compounds. These results are expected to provide valuable information for a deeper understanding of the molecular events related to dioxin toxicity. Furthermore, it presents an original methodology of data dimensionality reduction by using extreme phenotype as a guide to select relevant features prior to data modeling (biologically driven data reduction).
- MeSH
- biologické markery moč MeSH
- data mining MeSH
- játra účinky léků metabolismus MeSH
- lidé MeSH
- metabolomika metody MeSH
- monitorování životního prostředí metody MeSH
- oxidační stres účinky léků MeSH
- polychlorované dibenzodioxiny toxicita MeSH
- pracovní expozice analýza MeSH
- vysokoúčinná kapalinová chromatografie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Závěrečná zpráva o řešení grantu Agentury pro zdravotnický výzkum MZ ČR
nestr.
The recent technological advances enabled the biomedical research to explore the underlying biological processes of the living organisms at various resolutions and from different perspectives. While the amount of produced data grew dramatically over the years, the pace at which our knowledge lagged behind - an indication of inability of the current computational tools to extract knowledge from the large pool of noisy data. NEUROMINER will provide a framework for machine learning and data mining, with a special emphasis on neuroscience research. The project has three main axes of research, each corresponding to a currently unmet need: (1) extraction and selection of features with strong discrimination properties, (2) systems able to learn from high-dimensional data and not suffering from overfitting problems, and (3) rigorous statistical model assessment procedure. The applicants are experts in medical image processing and analysis, biostatistics and machine learning.
Nedávné technologické pokroky biomedicínského výzkumu umožnily zkoumat základní biologické procesy v živých organismech při různých rozlišeních a z různých úhlů pohledu. Zatímco množství produkovaných dat v průběhu let dramaticky roste, tempo našich získávaných znalostí spíše zaostává, což ukazuje na neschopnost současných výpočetních nástrojů umožnit extrakci znalostí z velkého množství zašuměných dat. NEUROMINER poskytne rámec pro strojové učení a dolování z obrazových dat se zvláštním důrazem na neurovědní výzkum. Tři hlavní osy projektu odpovídají problémům, pro které v současné době není známo řešení: (1) extrakce a selekce příznaků se silnou diskriminačních schopností z mnohorozměrných dat, (2) nepřeučené systémy učící se z mnohorozměrných dat (3) rigorózní postup pro statistické validace modelů. Navrhovatelé projektu jsou experty ve zpracování analýze medicínských obrazů, biostatistice a strojovém učení.
- MeSH
- biostatistika MeSH
- data mining MeSH
- mozek diagnostické zobrazování MeSH
- neuronové sítě MeSH
- neurozobrazování MeSH
- počítačové zpracování obrazu MeSH
- reprodukovatelnost výsledků MeSH
- schizofrenie diagnostické zobrazování MeSH
- strojové učení MeSH
- Konspekt
- Patologie. Klinická medicína
- NLK Obory
- neurologie
- radiologie, nukleární medicína a zobrazovací metody
- lékařská informatika
- NLK Publikační typ
- závěrečné zprávy o řešení grantu AZV MZ ČR