The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.
We address the problem of entropy estimation for high-dimensional finite-accuracy data. Our main application is evaluating high-order mutual information image similarity criteria for multimodal image registration. The basis of our method is an estimator based on k-th nearest neighbor (NN) distances, modified so that only distances greater than some constant R are evaluated. This modification requires a correction which is found numerically in a preprocessing step using quadratic programming. We compare experimentally our new method with k-NN and histogram estimators on synthetic data as well as for evaluation of mutual information for image similarity.
- MeSH
- Algorithms MeSH
- Entropy MeSH
- Financing, Organized MeSH
- Image Interpretation, Computer-Assisted methods MeSH
- Magnetic Resonance Imaging methods MeSH
- Brain anatomy & histology MeSH
- Reproducibility of Results MeSH
- Pattern Recognition, Automated methods MeSH
- Sensitivity and Specificity MeSH
- Artificial Intelligence MeSH
- Image Enhancement methods MeSH
BACKGROUND: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. RESULTS: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets.
- MeSH
- Algorithms MeSH
- Databases, Factual MeSH
- Humans MeSH
- Pancreatic Neoplasms diagnosis genetics MeSH
- Computer Simulation MeSH
- Proteomics methods MeSH
- Reproducibility of Results MeSH
- Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization methods MeSH
- Machine Learning MeSH
- Case-Control Studies MeSH
- Models, Theoretical MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
A new method for two-dimensional deconvolution of medical ultrasonic images is presented. The spatial resolution of the deconvolved images is much higher compared to the common images of the fundamental and second harmonic. The deconvolution also results in a more distinct speckle pattern. Unlike the most published deconvolution algorithms for ultrasonic images, the presented technique can be implemented using currently available hardware in real-time imaging, with a rate up to 50 frames per second. This makes it attractive for application in the current ultrasound scanners. The algorithm is based on two-dimensional homomorphic deconvolution with simplified assumptions about the point spread function. Broadband radio frequency image data are deconvolved instead of common fundamental harmonic data. Thus, information of both the first and second harmonics is used. The method was validated on image data recorded from a tissue-mimicking phantom and on clinical image data.
- MeSH
- Algorithms MeSH
- Phantoms, Imaging MeSH
- Financing, Organized MeSH
- Image Interpretation, Computer-Assisted methods MeSH
- Humans MeSH
- Signal Processing, Computer-Assisted MeSH
- Reproducibility of Results MeSH
- Sensitivity and Specificity MeSH
- Information Storage and Retrieval methods MeSH
- Ultrasonography methods instrumentation MeSH
- Image Enhancement methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Evaluation Study MeSH
První vysoce přesná 3D kinematická data pánevních končetin zaznamenaná u normálních psů čtyř různých plemen (bígl, francouzský buldoček, malinois, vipet) in vivo pomocí biplanární vysokofrekvenční fluoroskopie v kombinaci s 3D optoelektrickým systémem a následnou analýzou XROMM bez použití markerů (Scientific Rotoscoping, SR nebo 3D-2D registrační proces) odhalují: a) 3D kinematiku pánevních končetin v dosud nevídané míře přesnosti; b) podstatná omezení při použití dat založených na kožních markerech. Očekávali jsme, že kinematika pánevních končetin se bude lišit v závislosti na tvaru těla. Srovnání čtyř plemen odlišuje francouzského buldočka od ostatních, pokud jde o trajektorie ve frontální rovině (abdukce/addukce) a rotaci stehenní kosti v dlouhé ose. Francouzští buldočci převádějí rozsáhlou rotaci dlouhé osy femuru (> 30°) do silného laterálního posunu a rotace kolem kraniokaudální osy (rolování) a distálně-proximální osy pánve (odklon), aby kompenzovali silně abdukované postavení pánevních končetin od začátku stoje. Předpokládáme, že plemena, která vykazují neobvyklou kinematiku, zejména vysokou abdukci femuru, mohou být náchylná k vyššímu dlouhodobému zatížení křížových vazů.
The first high-precision 3D in vivo hindlimb kinematic data to be recorded in normal dogs of four different breeds (Beagle, French bulldog, Malinois, Whippet) using biplanar, high-frequency fluoroscopy combined with a 3D optoelectric system followed by a markerless XROMM analysis (Scientific Rotoscoping, SR or 3D-2D registration process) reveal a) 3D hindlimb kinematics to an unprecedented degree of precision and b) substantial limitations to the use of skin marker -based data. We expected hindlimb kinematics to differ in relation to body shape. But, a comparison of the four breeds sets the French bulldog aside from the others in terms of trajectories in the frontal plane (abduction/adduction) and long axis rotation of the femur. French bulldogs translate extensive femoral long axis rotation (>30°) into a strong lateral displacement and rotations about the craniocaudal (roll) and the distalproximal (yaw) axes of the pelvis in order to compensate for a highly abducted hindlimb position from the beginning of stance. We assume that breeds which exhibit unusual kinematics, especially high femoral abduction, might be susceptible to a higher long-term loading of the cruciate ligaments.
- MeSH
- Data Analysis MeSH
- Biomechanical Phenomena * physiology MeSH
- Extremities * MeSH
- Dogs classification MeSH
- Animals MeSH
- Check Tag
- Dogs classification MeSH
- Animals MeSH
- Publication type
- Abstracts MeSH
The aim of this paper is to overview challenges and principles of Big Data analysis in biomedicine. Recent multivariate statistical approaches to complexity reduction represent a useful (and often irreplaceable) methodology allowing performing a reliable Big Data analysis. Attention is paid to principal component analysis, partial least squares, and variable selection based on maximizing conditional entropy. Some important problems as well as ideas of complexity reduction are illustrated on examples from biomedical research tasks. These include high-dimensional data in the form of facial images or gene expression measurements from a cardiovascular genetic study.
- MeSH
- Data Analysis MeSH
- Principal Component Analysis methods MeSH
- Big Data * MeSH
- Biostatistics * methods MeSH
- Cardiovascular Diseases genetics prevention & control MeSH
- Humans MeSH
- Least-Squares Analysis MeSH
- Risk MeSH
- Facial Recognition MeSH
- Decision Support Systems, Clinical MeSH
- Check Tag
- Humans MeSH
- Publication type
- Research Support, Non-U.S. Gov't MeSH
Background: Microarray technologies are used to measure the simultaneous expression of a certain set of thousands of genes based on ribonucleic acid (RNA) obtained from a biological sample. We are interested in several statistical analyses such as 1) finding differentially expressed genes between or among several experimental groups, 2) finding a small number of genes allowing for the correct classification of a sample in a certain group, and 3) finding relations among genes. Objectives: Gene expression data are high dimensional, and this fact complicates their analysis because we are able to perform only a few samples (e.g. the peripheral blood from a limited number of patients) for a certain set of thousands of genes. The main purpose of this paper is to present the shrinkage estimator and show its application in different statistical analyses. Methods: The shrinkage approach relates to the shift of a certain value of a classic estimator towards a certain value of a specified target estimator. More precisely, the shrinkage estimator is the weighted average of the classic estimator and the target estimator. Results: The benefit of the shrinkage estimator is that it improves the mean squared error (MSE) as compared to a classic estimator. The MSE combines the measure of an estimator’s bias away from its true unknown value and the measure of the estimator’s variability. The shrinkage estimator is a biased estimator but has a lower variability. Conclusions: The shrinkage estimator can be considered as a promising estimator for analyzing high dimensional gene expression data.
- MeSH
- Gene Expression * genetics MeSH
- Humans MeSH
- Microarray Analysis * methods statistics & numerical data MeSH
- RNA * genetics MeSH
- Models, Statistical MeSH
- Check Tag
- Humans MeSH
- Publication type
- Research Support, Non-U.S. Gov't MeSH
Untargeted metabolomic approaches offer new opportunities for a deeper understanding of the molecular events related to toxic exposure. This study proposes a metabolomic investigation of biochemical alterations occurring in urine as a result of dioxin toxicity. Urine samples were collected from Czech chemical workers submitted to severe dioxin occupational exposure in a herbicide production plant in the late 1960s. Experiments were carried out with ultra-high pressure liquid chromatography (UHPLC) coupled to high-resolution quadrupole time-of-flight (QTOF) mass spectrometry. A chemistry-driven feature selection was applied to focus on steroid-related metabolites. Supervised multivariate data analysis allowed biomarkers, mainly related to bile acids, to be highlighted. These results supported the hypothesis of liver damage and oxidative stress for long-term dioxin toxicity. As a second step of data analysis, the information gained from the urine analysis of Victor Yushchenko after his poisoning was examined. A subset of relevant urinary markers of acute dioxin toxicity from this extreme phenotype, including glucuro- and sulfo-conjugated endogenous steroid metabolites and bile acids, was assessed for its ability to detect long-term effects of exposure. The metabolomic strategy presented in this work allowed the determination of metabolic patterns related to dioxin effects in human and the discovery of highly predictive subsets of biologically meaningful and clinically relevant compounds. These results are expected to provide valuable information for a deeper understanding of the molecular events related to dioxin toxicity. Furthermore, it presents an original methodology of data dimensionality reduction by using extreme phenotype as a guide to select relevant features prior to data modeling (biologically driven data reduction).
- MeSH
- Biomarkers urine MeSH
- Data Mining MeSH
- Liver drug effects metabolism MeSH
- Humans MeSH
- Metabolomics methods MeSH
- Environmental Monitoring methods MeSH
- Oxidative Stress drug effects MeSH
- Polychlorinated Dibenzodioxins toxicity MeSH
- Occupational Exposure analysis MeSH
- Chromatography, High Pressure Liquid MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Závěrečná zpráva o řešení grantu Agentury pro zdravotnický výzkum MZ ČR
nestr.
The recent technological advances enabled the biomedical research to explore the underlying biological processes of the living organisms at various resolutions and from different perspectives. While the amount of produced data grew dramatically over the years, the pace at which our knowledge lagged behind - an indication of inability of the current computational tools to extract knowledge from the large pool of noisy data. NEUROMINER will provide a framework for machine learning and data mining, with a special emphasis on neuroscience research. The project has three main axes of research, each corresponding to a currently unmet need: (1) extraction and selection of features with strong discrimination properties, (2) systems able to learn from high-dimensional data and not suffering from overfitting problems, and (3) rigorous statistical model assessment procedure. The applicants are experts in medical image processing and analysis, biostatistics and machine learning.
Nedávné technologické pokroky biomedicínského výzkumu umožnily zkoumat základní biologické procesy v živých organismech při různých rozlišeních a z různých úhlů pohledu. Zatímco množství produkovaných dat v průběhu let dramaticky roste, tempo našich získávaných znalostí spíše zaostává, což ukazuje na neschopnost současných výpočetních nástrojů umožnit extrakci znalostí z velkého množství zašuměných dat. NEUROMINER poskytne rámec pro strojové učení a dolování z obrazových dat se zvláštním důrazem na neurovědní výzkum. Tři hlavní osy projektu odpovídají problémům, pro které v současné době není známo řešení: (1) extrakce a selekce příznaků se silnou diskriminačních schopností z mnohorozměrných dat, (2) nepřeučené systémy učící se z mnohorozměrných dat (3) rigorózní postup pro statistické validace modelů. Navrhovatelé projektu jsou experty ve zpracování analýze medicínských obrazů, biostatistice a strojovém učení.
- MeSH
- Biostatistics MeSH
- Data Mining MeSH
- Brain diagnostic imaging MeSH
- Neural Networks, Computer MeSH
- Neuroimaging MeSH
- Image Processing, Computer-Assisted MeSH
- Reproducibility of Results MeSH
- Schizophrenia diagnostic imaging MeSH
- Machine Learning MeSH
- Conspectus
- Patologie. Klinická medicína
- NML Fields
- neurologie
- radiologie, nukleární medicína a zobrazovací metody
- lékařská informatika
- NML Publication type
- závěrečné zprávy o řešení grantu AZV MZ ČR