JavaScript is NOT enabled !

Please enable JavaScript.

Data pre-processing Query Show help

Exact matching

Reset

218 hits in PubMed

Article

Comprehensive assessment of the role of spectral data pre-processing in spectroscopy-based liquid biopsy

... Nevertheless, most issues may be addressed through appropriate data pre-processing. ...

Vrtělka, Ondřej
Author Vrtělka, Ondřej Department of Analytical Chemistry, Faculty of Chemical Engineering, University of Chemistry and Technology, Prague, Technická 5, 166 28 Prague 6, Czech Republic. Electronic address: Ondrej.Vrtelka@vscht.cz
Králová, Kateřina
Author Králová, Kateřina Department of Analytical Chemistry, Faculty of Chemical Engineering, University of Chemistry and Technology, Prague, Technická 5, 166 28 Prague 6, Czech Republic
Fousková, Markéta
Author Fousková, Markéta Department of Analytical Chemistry, Faculty of Chemical Engineering, University of Chemistry and Technology, Prague, Technická 5, 166 28 Prague 6, Czech Republic
Setnička, Vladimír
Author Setnička, Vladimír Department of Analytical Chemistry, Faculty of Chemical Engineering, University of Chemistry and Technology, Prague, Technická 5, 166 28 Prague 6, Czech Republic. Electronic address: Vladimir.Setnicka@vscht.cz

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy. 2025 Oct 15 ; 339 () : 126261. [epub] 20250419

Spectrochim Acta A Mol Biomol Spectrosc
ISSN 1873-3557 | 1386-1425
Source

Spectroscopic data often contain artifacts or noise related to the sample characteristics, instrumental variations, or experimental design flaws. Therefore, classifying the raw data is not recommended and might lead to biased results. Nevertheless, most issues may be addressed through appropriate data pre-processing. Effective pre-processing is particularly crucial in critical applications like liquid biopsy for disease detection, where even minor performance improvements may impact patient outcomes. Unfortunately, there is no consensus regarding optimal pre-processing, complicating cross-study comparisons. This study presents a comprehensive evaluation of various pre-processing methods and their combinations to assess their influence on classification results. The goal was to identify whether some pre-processing methods are associated with higher classification outcomes and find an optimal strategy for the given data. Data from Raman optical activity and infrared and Raman spectroscopy were processed, applying tens of thousands of possible pre-processing pipelines. The resulting data were classified using three algorithms to distinguish between subjects with liver cirrhosis and those who had developed hepatocellular carcinoma. Results highlighted that some specific pre-processing methods often ranked among the best classification results, such as the Rolling Ball for correcting the baseline of Raman spectra or the Doubly Reweighted Penalized Least Squares and Mixture model in the case of Raman optical activity. On the other hand, the selection of filtering and/or normalization approach usually did not have a significant impact. Nonetheless, the pre-processing of top-scoring pipelines also depended on the classifier utilized. The best pipelines yielded an AUROC of 0.775-0.823, varying with the evaluated spectroscopic data and classifier.

Keywords
Chiroptical spectroscopy, Classification, Data pre-processing, Diagnostics, Liquid biopsy, Machine learning, Vibrational spectroscopy,
MeSH
Algorithms MeSH
Carcinoma, Hepatocellular * diagnosis pathology MeSH
Liver Cirrhosis diagnosis pathology MeSH
Humans MeSH
Least-Squares Analysis MeSH
Liver Neoplasms * diagnosis pathology MeSH
Spectrum Analysis, Raman * methods MeSH
Spectrophotometry, Infrared methods MeSH
Liquid Biopsy methods MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH

Article

The impact of spectral data pre-processing on the assessment of red wine vintage through spectroscopic methods

... achieved for the majority of the developed multivariate models and the impact of the applied spectral pre-processing ...

Journal of the science of food and agriculture. 2025 Aug 30 ; 105 (11) : 5986-5998. [epub] 20250512

J Sci Food Agric
ISSN 1097-0010 | 0022-5142
Source

BACKGROUND: Red wine is a common target of fraudulent acts considering its high market value and popularity. Although there has been much effort to assess the geographical and varietal origin of wine, this is not the case for wine vintage. Vintage is a crucial parameter for the market price, especially in the case of reputable wines. Considering the season-to-season variations affecting wine quality and the ever-occurring unstable climatological conditions due to climate change, developing analytical strategies to accurately assess wine vintage is topical and of high interest. RESULTS: In this study, we successfully employed ultraviolet-visible spectroscopy, fluorescence spectroscopy and mid-infrared spectroscopy to identify the vintage of a protected designation of origin red wine produced during four different vintages (n = 36). Class-based clustering and great discriminatory performance was achieved for the majority of the developed multivariate models and the impact of the applied spectral pre-processing was significant. Importantly, the tested scatter correction methods resulted in the best cross-validation parameters (goodness of fit, R2Y > 0.9 and goodness of prediction, Q2Y > 0.8) with calculated recognition and prediction abilities in the range 77-100% and 65-96%, respectively, when using partial least squares discriminant analysis. In addition, in the case of fluorescence spectroscopy, a batch effect was revealed, which was compensated by the spectral pre-processing methods. Spectral feature selection was performed in all cases to use only the analytically important spectral signals and omit model overfitting. CONCLUSIONS: The developed method is simple, cost-efficient and non-destructive, indicating its high potential for industrial applications as a rapid screening tool. © 2025 The Author(s). Journal of the Science of Food and Agriculture published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.

Keywords
absorption spectroscopy, attenuated total reflectance Fourier transform infrared spectroscopy, chemometrics, spectral pre‐processing, wine authenticity,
MeSH
Discriminant Analysis MeSH
Spectrometry, Fluorescence methods MeSH
Seasons MeSH
Spectrum Analysis * methods MeSH
Wine * analysis MeSH
Vitis * chemistry growth & development MeSH
Publication type
Journal Article MeSH
Evaluation Study MeSH

Article

Poisson pre-processing of nonstationary photonic signals: Signals with equality between mean and variance

... To alleviate this issue we developed a suitable pre-processing method for the signals that originate ...

PloS one. 2017 ; 12 (12) : e0188622. [epub] 20171207

PLoS One
ISSN 1932-6203
Source

Photonic signals are broadly exploited in communication and sensing and they typically exhibit Poisson-like statistics. In a common scenario where the intensity of the photonic signals is low and one needs to remove a nonstationary trend of the signals for any further analysis, one faces an obstacle: due to the dependence between the mean and variance typical for a Poisson-like process, information about the trend remains in the variance even after the trend has been subtracted, possibly yielding artifactual results in further analyses. Commonly available detrending or normalizing methods cannot cope with this issue. To alleviate this issue we developed a suitable pre-processing method for the signals that originate from a Poisson-like process. In this paper, a Poisson pre-processing method for nonstationary time series with Poisson distribution is developed and tested on computer-generated model data and experimental data of chemiluminescence from human neutrophils and mung seeds. The presented method transforms a nonstationary Poisson signal into a stationary signal with a Poisson distribution while preserving the type of photocount distribution and phase-space structure of the signal. The importance of the suggested pre-processing method is shown in Fano factor and Hurst exponent analysis of both computer-generated model signals and experimental photonic signals. It is demonstrated that our pre-processing method is superior to standard detrending-based methods whenever further signal analysis is sensitive to variance of the signal.

Article

Structural MRI-Based Schizophrenia Classification Using Autoencoders and 3D Convolutional Neural Networks in Combination with Various Pre-Processing Techniques

... can help improve the accuracy of deep learning-based classifiers compared to minimally preprocessed data ...

Brain sciences. 2022 May 09 ; 12 (5) : . [epub] 20220509

Brain Sci
ISSN 2076-3425
Source

Schizophrenia is a severe neuropsychiatric disease whose diagnosis, unfortunately, lacks an objective diagnostic tool supporting a thorough psychiatric examination of the patient. We took advantage of today's computational abilities, structural magnetic resonance imaging, and modern machine learning methods, such as stacked autoencoders (SAE) and 3D convolutional neural networks (3D CNN), to teach them to classify 52 patients with schizophrenia and 52 healthy controls. The main aim of this study was to explore whether complex feature extraction methods can help improve the accuracy of deep learning-based classifiers compared to minimally preprocessed data. Our experiments employed three commonly used preprocessing steps to extract three different feature types. They included voxel-based morphometry, deformation-based morphometry, and simple spatial normalization of brain tissue. In addition to classifier models, features and their combination, other model parameters such as network depth, number of neurons, number of convolutional filters, and input data size were also investigated. Autoencoders were trained on feature pools of 1000 and 5000 voxels selected by Mann-Whitney tests, and 3D CNNs were trained on whole images. The most successful model architecture (autoencoders) achieved the highest average accuracy of 69.62% (sensitivity 68.85%, specificity 70.38%). The results of all experiments were statistically compared (the Mann-Whitney test). In conclusion, SAE outperformed 3D CNN, while preprocessing using VBM helped SAE improve the results.

Keywords
3D CNN, autoencoders, classification, deep learning, deformation-based morphometry, schizophrenia, voxel-based morphometry,
Publication type
Journal Article MeSH

Article

Data processing pipeline for cardiogenic shock prediction using machine learning

... INTRODUCTION: Recent advances in machine learning provide new possibilities to process and analyse observational ...

Frontiers in cardiovascular medicine. 2023 ; 10 () : 1132680. [epub] 20230323

Front Cardiovasc Med
ISSN 2297-055X
Source

INTRODUCTION: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS. METHODS: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)-based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. RESULTS: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. CONCLUSION: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.

Keywords
cardiogenic shock, classification, machine learning, missing data imputation, prediction model, processing pipeline,
Publication type
Journal Article MeSH

Article

Bayesian multiple hypotheses testing in compositional analysis of untargeted metabolomic data

... metabolic statuses of patient and control groups with the intention of understanding pathobiochemical processes ...

Analytica chimica acta. 2020 Feb 08 ; 1097 () : 49-61. [epub] 20191116

Anal Chim Acta
ISSN 1873-4324 | 0003-2670
Source

Clinical metabolomics aims at finding statistically significant differences in metabolic statuses of patient and control groups with the intention of understanding pathobiochemical processes and identification of clinically useful biomarkers of particular diseases. After the raw measurements are integrated and pre-processed as intensities of chromatographic peaks, the differences between controls and patients are evaluated by both univariate and multivariate statistical methods. The traditional univariate approach relies on t-tests (or their nonparametric alternatives) and the results from multiple testing are misleadingly compared merely by p-values using the so-called volcano plot. This paper proposes a Bayesian counterpart to the widespread univariate analysis, taking into account the compositional character of a metabolome. Since each metabolome is a collection of some small-molecule metabolites in a biological material, the relative structure of metabolomic data, which is inherently contained in ratios between metabolites, is of the main interest. Therefore, a proper choice of logratio coordinates is an essential step for any statistical analysis of such data. In addition, a concept of b-values is introduced together with a Bayesian version of the volcano plot incorporating distance levels of the posterior highest density intervals from zero. The theoretical background of the contribution is illustrated using two data sets containing samples of patients suffering from 3-hydroxy-3-methylglutaryl-CoA lyase deficiency and medium-chain acyl-CoA dehydrogenase deficiency. To evaluate the stability of the proposed method as well as the benefits of the compositional approach, two simulations designed to mimic a loss of samples and a systematical measurement error, respectively, are added.

Keywords
Bayesian inference, Compositional data, High-dimensional data, Multiple hypotheses testing, Untargeted metabolomics, Volcano plot,
MeSH
Acetyl-CoA C-Acetyltransferase deficiency metabolism MeSH
Acyl-CoA Dehydrogenase deficiency metabolism MeSH
Bayes Theorem * MeSH
Datasets as Topic MeSH
Humans MeSH
Metabolomics * MeSH
Amino Acid Metabolism, Inborn Errors metabolism MeSH
Lipid Metabolism, Inborn Errors metabolism MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH
Names of Substances
Acetyl-CoA C-Acetyltransferase MeSH
Acyl-CoA Dehydrogenase MeSH

Article

Tools, techniques, methods, and processes for the detection and mitigation of fraudulent or erroneous data in evidence synthesis: a scoping review protocol

... aims to identify, catalogue, and characterize previously reported tools, techniques, methods, and processes ...

JBI evidence synthesis. 2025 Mar 01 ; 23 (3) : 536-545. [epub] 20240910

JBI Evid Synth
ISSN 2689-8381
Source

OBJECTIVE: This scoping review aims to identify, catalogue, and characterize previously reported tools, techniques, methods, and processes that have been recommended or used by evidence synthesizers to detect fraudulent or erroneous data and mitigate its impact. INTRODUCTION: Decision-making for policy and practice should always be underpinned by the best available evidence-typically peer-reviewed scientific literature. Evidence synthesis literature should be collated and organized using the appropriate evidence synthesis methodology, best exemplified by the role systematic reviews play in evidence-based health care. However, with the rise of "predatory journals," fraudulent or erroneous data may be invading this literature, which may negatively affect evidence syntheses that use this data. This, in turn, may compromise decision-making processes. INCLUSION CRITERIA: This review will include peer-reviewed articles, commentaries, books, and editorials that describe at least 1 tool, technique, method, or process with the explicit purpose of identifying or mitigating the impact of fraudulent or erroneous data for any evidence synthesis, in any topic area. Manuals, handbooks, and guidance from major organizations, universities, and libraries will also be considered. METHODS: This review will be conducted using the JBI methodology for scoping reviews and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR). Databases and relevant organizational websites will be searched for eligible studies. Title and abstract, and, subsequently, full-text screening will be conducted in duplicate. Data from identified full texts will be extracted using a pre-determined checklist, while the findings will be summarized descriptively and presented in tables. REVIEW REGISTRATION: Open Science Framework https://osf.io/u8yrn.

Article

Uprava schémy chrupu pre pocítacové spracovanie
[Adjustment of the chart of a set of teeth for computer processing]

Prakticke zubni lekarstvi. 1987 Nov ; 35 (9) : 279-84.

Prakt Zubn Lek
ISSN 0032-6720
Source

Article

Adaptive data filtering of inertial sensors with variable bandwidth

... All of these constitute conditions require treatment through data processing. ...

Sensors (Basel, Switzerland). 2015 Feb 02 ; 15 (2) : 3282-98. [epub] 20150202

Sensors (Basel)
ISSN 1424-8220
Source

MEMS (micro-electro-mechanical system)-based inertial sensors, i.e., accelerometers and angular rate sensors, are commonly used as a cost-effective solution for the purposes of navigation in a broad spectrum of terrestrial and aerospace applications. These tri-axial inertial sensors form an inertial measurement unit (IMU), which is a core unit of navigation systems. Even if MEMS sensors have an advantage in their size, cost, weight and power consumption, they suffer from bias instability, noisy output and insufficient resolution. Furthermore, the sensor's behavior can be significantly affected by strong vibration when it operates in harsh environments. All of these constitute conditions require treatment through data processing. As long as the navigation solution is primarily based on using only inertial data, this paper proposes a novel concept in adaptive data pre-processing by using a variable bandwidth filtering. This approach utilizes sinusoidal estimation to continuously adapt the filtering bandwidth of the accelerometer's data in order to reduce the effects of vibration and sensor noise before attitude estimation is processed. Low frequency vibration generally limits the conditions under which the accelerometers can be used to aid the attitude estimation process, which is primarily based on angular rate data and, thus, decreases its accuracy. In contrast, the proposed pre-processing technique enables using accelerometers as an aiding source by effective data smoothing, even when they are affected by low frequency vibration. Verification of the proposed concept is performed on simulation and real-flight data obtained on an ultra-light aircraft. The results of both types of experiments confirm the suitability of the concept for inertial data pre-processing.

MeSH
Equipment Design MeSH
Geographic Information Systems MeSH
Aircraft standards MeSH
Aerospace Medicine instrumentation MeSH
Humans MeSH
Micro-Electrical-Mechanical Systems instrumentation MeSH
Software MeSH
Remote Sensing Technology * MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Article

Fully Automated DCNN-Based Thermal Images Annotation Using Neural Network Pretrained on RGB Data

... One of the biggest challenges of training deep neural network is the need for massive data annotation ...

Sensors (Basel, Switzerland). 2021 Feb 23 ; 21 (4) : . [epub] 20210223

Sensors (Basel)
ISSN 1424-8220
Source

One of the biggest challenges of training deep neural network is the need for massive data annotation. To train the neural network for object detection, millions of annotated training images are required. However, currently, there are no large-scale thermal image datasets that could be used to train the state of the art neural networks, while voluminous RGB image datasets are available. This paper presents a method that allows to create hundreds of thousands of annotated thermal images using the RGB pre-trained object detector. A dataset created in this way can be used to train object detectors with improved performance. The main gain of this work is the novel method for fully automatic thermal image labeling. The proposed system uses the RGB camera, thermal camera, 3D LiDAR, and the pre-trained neural network that detects objects in the RGB domain. Using this setup, it is possible to run the fully automated process that annotates the thermal images and creates the automatically annotated thermal training dataset. As the result, we created a dataset containing hundreds of thousands of annotated objects. This approach allows to train deep learning models with similar performance as the common human-annotation-based methods do. This paper also proposes several improvements to fine-tune the results with minimal human intervention. Finally, the evaluation of the proposed solution shows that the method gives significantly better results than training the neural network with standard small-scale hand-annotated thermal image datasets.

Keywords
IR, RGB, YOLO, data annotation, deep convolutional neural networks, object detector, thermal, transfer learning,
Publication type
Journal Article MeSH

Published

Filters

Data pre-processing Query Show help

Exact matching

Data pre-processing Query Show help Exact matching

Refine by MeSH

Data pre-processing Query Show help

Exact matching