Missing data Dotaz Zobrazit nápovědu
OBJECTIVE: To compare several methods of missing data imputation for function (Health Assessment Questionnaire) and for disease activity (Disease Activity Score-28 and Clinical Disease Activity Index) in rheumatoid arthritis (RA) patients. METHODS: One thousand RA patients from observational cohort studies with complete data for function and disease activity at baseline, 6, 12 and 24 months were selected to conduct a simulation study. Values were deleted at random or following a predicted attrition bias. Three types of imputation were performed: (1) methods imputing forward in time (last observation carried forward; linear forward extrapolation); (2) methods considering data both forward and backward in time (nearest available observation-NAO; linear extrapolation; polynomial extrapolation); and (3) methods using multi-individual models (linear mixed effects cubic regression-LME3; multiple imputation by chained equation-MICE). The performance of each estimation method was assessed using the difference between the mean outcome value, the remission and low disease activity rates after imputation of the missing values and the true value. RESULTS: When imputing missing baseline values, all methods underestimated equally the true value, but LME3 and MICE correctly estimated remission and low disease activity rates. When imputing missing follow-up values at 6, 12, or 24 months, NAO provided the least biassed estimate of the mean disease activity and corresponding remission rate. These results were not affected by the presence of attrition bias. CONCLUSION: When imputing function and disease activity in large registers of active RA patients, researchers can consider the use of a simple method such as NAO for missing follow-up data, and the use of mixed-effects regression or multiple imputation for baseline data.
- Klíčová slova
- DAS28, disease activity, epidemiology, outcomes research, rheumatoid arthritis,
- MeSH
- algoritmy MeSH
- indukce remise MeSH
- interpretace statistických dat * MeSH
- kohortové studie MeSH
- lidé MeSH
- lineární modely MeSH
- následné studie MeSH
- počítačová simulace MeSH
- revmatoidní artritida epidemiologie MeSH
- stupeň závažnosti nemoci MeSH
- výzkumný projekt statistika a číselné údaje MeSH
- zkreslení výsledků (epidemiologie) MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- pozorovací studie MeSH
- práce podpořená grantem MeSH
- srovnávací studie MeSH
BACKGROUND: Magnetic resonance spectroscopy provides metabolic information about living tissues in a non-invasive way. However, there are only few multi-centre clinical studies, mostly performed on a single scanner model or data format, as there is no flexible way of documenting and exchanging processed magnetic resonance spectroscopy data in digital format. This is because the DICOM standard for spectroscopy deals with unprocessed data. This paper proposes a plugin tool developed for jMRUI, namely jMRUI2XML, to tackle the latter limitation. jMRUI is a software tool for magnetic resonance spectroscopy data processing that is widely used in the magnetic resonance spectroscopy community and has evolved into a plugin platform allowing for implementation of novel features. RESULTS: jMRUI2XML is a Java solution that facilitates common preprocessing of magnetic resonance spectroscopy data across multiple scanners. Its main characteristics are: 1) it automates magnetic resonance spectroscopy preprocessing, and 2) it can be a platform for outputting exchangeable magnetic resonance spectroscopy data. The plugin works with any kind of data that can be opened by jMRUI and outputs in extensible markup language format. Data processing templates can be generated and saved for later use. The output format opens the way for easy data sharing- due to the documentation of the preprocessing parameters and the intrinsic anonymization--for example for performing pattern recognition analysis on multicentre/multi-manufacturer magnetic resonance spectroscopy data. CONCLUSIONS: jMRUI2XML provides a self-contained and self-descriptive format accounting for the most relevant information needed for exchanging magnetic resonance spectroscopy data in digital form, as well as for automating its processing. This allows for tracking the procedures the data has undergone, which makes the proposed tool especially useful when performing pattern recognition analysis. Moreover, this work constitutes a first proposal for a minimum amount of information that should accompany any magnetic resonance processed spectrum, towards the goal of achieving better transferability of magnetic resonance spectroscopy studies.
- MeSH
- algoritmy * MeSH
- automatizované zpracování dat statistika a číselné údaje MeSH
- lidé MeSH
- magnetická rezonanční spektroskopie metody MeSH
- magnetická rezonanční tomografie metody MeSH
- počítačové zpracování obrazu metody MeSH
- software * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
OBJECTIVES: We aimed to compare various methods for imputing disease activity in longitudinally collected observational data of patients with axial spondyloarthritis (axSpA). METHODS: We conducted a simulation study on data from 8583 axSpA patients from ten European registries. Disease activity was assessed by the Axial Spondyloarthritis Disease Activity Score (ASDAS) and the corresponding low disease activity (LDA; ASDAS<2.1) state at baseline, 6 and 12 months. We focused on cross-sectional methods which impute missing values of an individual at a particular time point based on the available information from other individuals at that time point. We applied nine single and five multiple imputation methods, covering mean, regression and hot deck methods. The performance of each imputation method was evaluated via relative bias and coverage of 95% confidence intervals for the mean ASDAS and the derived proportion of patients in LDA. RESULTS: Hot deck imputation methods outperformed mean and regression methods, particularly when assessing LDA. Multiple imputation procedures provided better coverage than the corresponding single imputation ones. However, none of the evaluated methods produced unbiased estimates with adequate coverage across all time points, with performance for missing baseline data being worse than for missing follow-up data. Predictive mean and weighted predictive mean hot deck imputation procedures consistently provided results with low bias. CONCLUSIONS: This study contributes to the available methods for imputing disease activity in observational research. Hot deck imputation using predictive mean matching exhibited the highest robustness and is thus our suggested approach.
- Klíčová slova
- Axial Spondyloarthritis, Epidemiology, Interleukin-17, Tumour Necrosis Factor Inhibitors,
- MeSH
- axiální spondyloartritida * diagnóza epidemiologie MeSH
- dospělí MeSH
- interpretace statistických dat MeSH
- lidé MeSH
- longitudinální studie MeSH
- pozorovací studie jako téma * MeSH
- průřezové studie MeSH
- registrace MeSH
- spondylartritida diagnóza MeSH
- stupeň závažnosti nemoci * MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Evropa epidemiologie MeSH
BACKGROUND: Observational data on composite scores often comes with missing component information. When a complete-case (CC) analysis of composite scores is unbiased, preferable approaches of dealing with missing component information should also be unbiased and provide a more precise estimate. We assessed the performance of several methods compared to CC analysis in estimating the means of common composite scores used in axial spondyloarthritis research. METHODS: Individual mean imputation (IMI), the modified formula method (MF), overall mean imputation (OMI), and multiple imputation of missing component values (MI) were assessed either analytically or by means of simulations from available data collected across Europe. Their performance in estimating the means of the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), the Bath Ankylosing Spondylitis Functional Index (BASFI), and the Ankylosing Spondylitis Disease Activity Score based on C-reactive protein (ASDAS-CRP) in cases where component information was set missing completely at random was compared to the CC approach based on bias, variance, and coverage. RESULTS: Like the MF method, IMI uses a modified formula for observations with missing components resulting in modified composite scores. In the case of an unbiased CC approach, these two methods yielded representative samples of the distribution arising from a mixture of the original and modified composite scores, which, however, could not be considered the same as the distribution of the original score. The IMI and MF method are, thus, intrinsically biased. OMI provided an unbiased mean but displayed a complex dependence structure among observations that, if not accounted for, resulted in severe coverage issues. MI improved precision compared to CC and gave unbiased means and proper coverage as long as the extent of missingness was not too large. CONCLUSIONS: MI of missing component values was the only method found successful in retaining CC's unbiasedness and in providing increased precision for estimating the means of BASDAI, BASFI, and ASDAS-CRP. However, since MI is susceptible to incorrect implementation and its performance may become questionable with increasing missingness, we consider the implementation of an error-free CC approach a valid and valuable option. TRIAL REGISTRATION: Not applicable as study uses data from patient registries.
- Klíčová slova
- Axial spondyloarthritis, Complete-case analysis, Composite score, Missing components, Multiple imputation,
- MeSH
- ankylózující spondylitida MeSH
- axiální spondyloartritida * MeSH
- C-reaktivní protein analýza MeSH
- interpretace statistických dat MeSH
- lidé MeSH
- stupeň závažnosti nemoci MeSH
- výzkumný projekt MeSH
- zkreslení výsledků (epidemiologie) MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Evropa MeSH
- Názvy látek
- C-reaktivní protein MeSH
Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package "traitor" to facilitate assessments of missing trait data.
There is limited information on the association between participants' clinical status or trajectories and missing data in electronic monitoring studies of bipolar disorder (BD). We collected self-ratings scales and sensor data in 145 adults with BD. Using a new metric, Missing Data Ratio (MDR), we assessed missing self-rating data and sensor data monitoring activity and sleep. Missing data were lowest for participants in the midst of a depressive episode, intermediate for participants with subsyndromal symptoms, and highest for participants who were euthymic. Over a mean ± SD follow-up of 246 ± 181 days, missing data remained unchanged for participants whose clinical status did not change throughout the study (i.e., those who entered the study in a depressive episode and did not improve, or those who entered the study euthymic and remained euthymic). Conversely, when participants' clinical status changed during the study (e.g., those who entered the study euthymic and experienced the occurrence of a depressive episode), missing data for self-rating scales increased, but not for sensor data. Overall missing data were associated with participants' clinical status and its changes, suggesting that these are not missing at random.
- MeSH
- bipolární porucha * epidemiologie MeSH
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- longitudinální studie MeSH
- mladý dospělý MeSH
- zpráva o sobě MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
The limited specimen tilting range that is typically available in electron tomography gives rise to a region in the Fourier space of the reconstructed object where experimental data are unavailable - the missing wedge. Since this region is sharply delimited from the area of available data, the reconstructed signal is typically hampered by convolution with its impulse response, which gives rise to the well-known missing wedge artefacts in 3D reconstructions. Despite the recent progress in the field of reconstruction and regularization techniques, the missing wedge artefacts remain untreated in most current reconstruction workflows in structural biology. Therefore we have designed a simple Fourier angular filter that effectively suppresses the ray artefacts in the single-axis tilting projection acquisition scheme, making single-axis tomographic reconstructions easier to interpret in particular at low signal-to-noise ratio in acquired projections. The proposed filter can be easily incorporated into current electron tomographic reconstruction schemes.
- Klíčová slova
- Electron tomography, Missing wedge, Missing wedge artefacts, Single-axis tilting,
- MeSH
- artefakty MeSH
- Fourierova analýza MeSH
- krysa rodu Rattus MeSH
- líska ultrastruktura MeSH
- mozeček ultrastruktura MeSH
- počítačové zpracování obrazu * MeSH
- poměr signál - šum MeSH
- pyl ultrastruktura MeSH
- tomografie elektronová metody MeSH
- Trypanosoma brucei brucei ultrastruktura MeSH
- zvířata MeSH
- Check Tag
- krysa rodu Rattus MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Research Support, N.I.H., Extramural MeSH
INTRODUCTION: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. METHODS: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. RESULTS: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. CONCLUSION: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.
- Klíčová slova
- cardiogenic shock, classification, machine learning, missing data imputation, prediction model, processing pipeline,
- Publikační typ
- časopisecké články MeSH
BACKGROUND AND OBJECTIVES: The pharmacokinetics of polyethylene glycol-conjugated asparaginase (PEG-ASNase) are characterized by an increase in elimination over time, a marked increase in ASNase activity levels from induction to reinduction, and high inter- and intraindividual variability. A population pharmacokinetic (PopPK) model is required to estimate individual dose intensity, despite gaps in monitoring compliance. METHODS: In the AIEOP-BFM ALL 2009 trial, two PEG-ASNase administrations (2500 U/m2 intravenously) during induction (14-day interval) and one administration during reinduction were administered in children with acute lymphoblastic leukemia. ASNase activity levels were monitored weekly. A PopPK model was used for covariate modeling and external validation. The predictivity of the model in case of missing data was tested for observations, as well as for the derived parameters of the area under the concentration time curve (AUC0-∞) and time above different thresholds. RESULTS: Compared to the first administration in induction (1374 patients, 6069 samples), the initial clearance and volume of distribution decreased by 11.0% and 15.9%, respectively, during the second administration during induction and by 41.2% and 28.4% during reinduction. Furthermore, the initial clearance linearly increased for children aged > 8 years and was 7.1% lower for females. The model was successfully externally validated (1253 patients, 5523 samples). In case of missing data, > 52% of the predictions for observations and > 82% for derived parameters were within ± 20% of the nominal value. CONCLUSION: A PopPK model that describes the complex pharmacokinetics of PEG-ASNase was successfully externally validated. AUC0-∞ or time above different thresholds, which are parameters describing dose intensity, can be estimated with high predictivity, despite missing data. ( www.clinicaltrials.gov , NCT01117441, first submitted date: May 3, 2010).
- MeSH
- akutní lymfatická leukemie farmakoterapie MeSH
- asparaginasa aplikace a dávkování farmakokinetika MeSH
- biologické modely * MeSH
- dítě MeSH
- kojenec MeSH
- lidé MeSH
- mladiství MeSH
- plocha pod křivkou MeSH
- polyethylenglykoly aplikace a dávkování farmakokinetika MeSH
- předškolní dítě MeSH
- protinádorové látky aplikace a dávkování farmakokinetika MeSH
- tkáňová distribuce MeSH
- Check Tag
- dítě MeSH
- kojenec MeSH
- lidé MeSH
- mladiství MeSH
- mužské pohlaví MeSH
- předškolní dítě MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- klinické zkoušky MeSH
- multicentrická studie MeSH
- validační studie MeSH
- Názvy látek
- asparaginasa MeSH
- pegaspargase MeSH Prohlížeč
- polyethylenglykoly MeSH
- protinádorové látky MeSH