JavaScript NENÍ povolen !

Prosím povolte JavaScript.

data preprocessing Dotaz Zobrazit nápovědu

Přesná shoda

Reset

67 záznamů v PubMed

Článek

Overview of data preprocessing for machine learning applications in human microbiome research

... technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data ...

Ibrahimi, Eliana
Autor Ibrahimi, Eliana Department of Biology, Faculty of Natural Sciences, University of Tirana, Tirana, Albania
Lopes, Marta B
Autor Lopes, Marta B Department of Mathematics, Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
Dhamo, Xhilda
Autor Dhamo, Xhilda Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania
Simeon, Andrea
Autor Simeon, Andrea BioSense Institute, University of Novi Sad, Novi Sad, Serbia
Shigdel, Rajesh
Autor Shigdel, Rajesh Department of Clinical Science, University of Bergen, Bergen, Norway
Hron, Karel
Autor Hron, Karel Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University Olomouc, Olomouc, Czechia
Stres, Blaž
Autor Stres, Blaž Department of Catalysis and Chemical Reaction Engineering, National Institute of Chemistry, Ljubljana, Slovenia Faculty of Civil and Geodetic Engineering, Institute of Sanitary Engineering, Ljubljana, Slovenia Department of Automation, Biocybernetics and Robotics, Jožef Stefan Institute, Ljubljana, Slovenia Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
D'Elia, Domenica
Autor D'Elia, Domenica Department of Biomedical Sciences, National Research Council, Institute for Biomedical Technologies, Bari, Italy
Berland, Magali
Autor Berland, Magali INRAE, MetaGenoPolis, Université Paris-Saclay, Jouy-en-Josas, France
Marcos-Zambrano, Laura Judith
Autor Marcos-Zambrano, Laura Judith Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain

Frontiers in microbiology. 2023 ; 14 () : 1250909. [epub] 20231005

Front Microbiol
ISSN 1664-302X
Zdroj

Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.

Klíčová slova
compositionality, data preprocessing, human microbiome, machine learning, metagenomics data, normalization,
Publikační typ
časopisecké články MeSH
přehledy MeSH

Článek

Using Entropy in Web Usage Data Preprocessing

Entropy (Basel, Switzerland). 2018 Jan 22 ; 20 (1) : . [epub] 20180122

Entropy (Basel)
ISSN 1099-4300
Zdroj

The paper is focused on an examination of the use of entropy in the field of web usage mining. Entropy creates an alternative possibility of determining the ratio of auxiliary pages in the session identification using the Reference Length method. The experiment was conducted on two different web portals. The first log file was obtained from a course of virtual learning environment web portal. The second log file was received from the web portal with anonymous access. A comparison of the results of entropy estimation of the ratio of auxiliary pages and a sitemap estimation of the ratio of auxiliary pages showed that in the case of sitemap abundance, entropy could be a full-valued substitution for the estimate of the ratio of auxiliary pages.

Klíčová slova
Reference Length, data preprocessing, information entropy, session identification, web usage mining,
Publikační typ
časopisecké články MeSH

Článek

Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset

... This paper presents a preprocessing framework to enhance VeReMi's usability and relevance. ...

Data in brief. 2025 Jun ; 60 () : 111599. [epub] 20250428

Data Brief
ISSN 2352-3409
Zdroj

The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi's usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like rcvTime, pos_0, pos_1, and attack_type (renamed attacker_type), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance.

Klíčová slova
Anomaly detection, Cybersecurity, Data preprocessing, Dataset optimization, Intelligent transportation systems (ITS), Internet of vehicles (IoV), Intrusion detection systems (IDS), Machine learning (ML), Network security, Vehicular reference misbehavior dataset (VeReMi),
Publikační typ
časopisecké články MeSH

Článek

A hybrid modular set for on-line preprocessing of data in electrophysiology

... An introductory review of hardware aspects of on-line experimental data processing reveals that the combination ...

Physiologia Bohemoslovaca. 1976 ; 25 (4) : 371-4.

Physiol Bohemoslov
ISSN 0369-9463
Zdroj

An introductory review of hardware aspects of on-line experimental data processing reveals that the combination of a specialized (hard-wired) preprocessing unit coupled with a programmable laboratory computer is an optimal set up for an electrophysiological laboratory. The paper deals with a proposed modular system, which makes the assembly of a large number of different preprocessing units possible. Some practical applications of the preprocessing units coupled with a LINC (D.E.C.) computer are presented in conclusion.

Článek

Batch alignment via retention orders for preprocessing large-scale multi-batch LC-MS experiments

... Meticulous selection of chromatographic peak detection parameters and algorithms is a crucial step in preprocessing ...

Bioinformatics (Oxford, England). 2022 Aug 02 ; 38 (15) : 3759-3767.

Bioinformatics
ISSN 1367-4811 | 1367-4803
Zdroj

MOTIVATION: Meticulous selection of chromatographic peak detection parameters and algorithms is a crucial step in preprocessing liquid chromatography-mass spectrometry (LC-MS) data. However, as mass-to-charge ratio and retention time shifts are larger between batches than within batches, finding apt parameters for all samples of a large-scale multi-batch experiment with the aim of minimizing information loss becomes a challenging task. Preprocessing independent batches individually can curtail said problems but requires a method for aligning and combining them for further downstream analysis. RESULTS: We present two methods for aligning and combining individually preprocessed batches in multi-batch LC-MS experiments. Our developed methods were tested on six sets of simulated and six sets of real datasets. Furthermore, by estimating the probabilities of peak insertion, deletion and swap between batches in authentic datasets, we demonstrate that retention order swaps are not rare in untargeted LC-MS data. AVAILABILITY AND IMPLEMENTATION: kmersAlignment and rtcorrectedAlignment algorithms are made available as an R package with raw data at https://metabocombiner.img.cas.cz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

MeSH
algoritmy MeSH
chromatografie kapalinová metody MeSH
metabolomika MeSH
proteomika * metody MeSH
tandemová hmotnostní spektrometrie * metody MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek

Typicality of functional connectivity robustly captures motion artifacts in rs-fMRI across datasets, atlases, and preprocessing pipelines

... Functional connectivity analysis of resting-state fMRI data has recently become one of the most common ...

Human brain mapping. 2020 Dec 15 ; 41 (18) : 5325-5340. [epub] 20200902

Hum Brain Mapp
ISSN 1097-0193 | 1065-9471
Zdroj

Functional connectivity analysis of resting-state fMRI data has recently become one of the most common approaches to characterizing individual brain function. It has been widely suggested that the functional connectivity matrix is a useful approximate representation of the brain's connectivity, potentially providing behaviorally or clinically relevant markers. However, functional connectivity estimates are known to be detrimentally affected by various artifacts, including those due to in-scanner head motion. Moreover, as individual functional connections generally covary only very weakly with head motion estimates, motion influence is difficult to quantify robustly, and prone to be neglected in practice. Although the use of individual estimates of head motion, or group-level correlation of motion and functional connectivity has been suggested, a sufficiently sensitive measure of individual functional connectivity quality has not yet been established. We propose a new intuitive summary index, Typicality of Functional Connectivity, to capture deviations from standard brain functional connectivity patterns. In a resting-state fMRI dataset of 245 healthy subjects, this measure was significantly correlated with individual head motion metrics. The results were further robustly reproduced across atlas granularity, preprocessing options, and other datasets, including 1,081 subjects from the Human Connectome Project. In principle, Typicality of Functional Connectivity should be sensitive also to other types of artifacts, processing errors, and possibly also brain pathology, allowing extensive use in data quality screening and quantification in functional connectivity studies as well as methodological investigations.

Klíčová slova
atlas, functional connectivity, motion, quality, rs-fMRI,
MeSH
artefakty MeSH
atlasy jako téma * MeSH
datové soubory jako téma * MeSH
dospělí MeSH
hlava - pohyby MeSH
konektom * metody normy MeSH
lidé MeSH
magnetická rezonanční tomografie * metody normy MeSH
mladý dospělý MeSH
mozek diagnostické zobrazování fyziologie MeSH
počítačové zpracování obrazu * metody normy MeSH
Check Tag
dospělí MeSH
lidé MeSH
mladý dospělý MeSH
mužské pohlaví MeSH
ženské pohlaví MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek

From raw data to data-analysis for magnetic resonance spectroscopy--the missing link: jMRUI2XML

... This is because the DICOM standard for spectroscopy deals with unprocessed data. ...

BMC bioinformatics. 2015 Nov 09 ; 16 () : 378. [epub] 20151109

BMC Bioinformatics
ISSN 1471-2105
Zdroj

BACKGROUND: Magnetic resonance spectroscopy provides metabolic information about living tissues in a non-invasive way. However, there are only few multi-centre clinical studies, mostly performed on a single scanner model or data format, as there is no flexible way of documenting and exchanging processed magnetic resonance spectroscopy data in digital format. This is because the DICOM standard for spectroscopy deals with unprocessed data. This paper proposes a plugin tool developed for jMRUI, namely jMRUI2XML, to tackle the latter limitation. jMRUI is a software tool for magnetic resonance spectroscopy data processing that is widely used in the magnetic resonance spectroscopy community and has evolved into a plugin platform allowing for implementation of novel features. RESULTS: jMRUI2XML is a Java solution that facilitates common preprocessing of magnetic resonance spectroscopy data across multiple scanners. Its main characteristics are: 1) it automates magnetic resonance spectroscopy preprocessing, and 2) it can be a platform for outputting exchangeable magnetic resonance spectroscopy data. The plugin works with any kind of data that can be opened by jMRUI and outputs in extensible markup language format. Data processing templates can be generated and saved for later use. The output format opens the way for easy data sharing- due to the documentation of the preprocessing parameters and the intrinsic anonymization--for example for performing pattern recognition analysis on multicentre/multi-manufacturer magnetic resonance spectroscopy data. CONCLUSIONS: jMRUI2XML provides a self-contained and self-descriptive format accounting for the most relevant information needed for exchanging magnetic resonance spectroscopy data in digital form, as well as for automating its processing. This allows for tracking the procedures the data has undergone, which makes the proposed tool especially useful when performing pattern recognition analysis. Moreover, this work constitutes a first proposal for a minimum amount of information that should accompany any magnetic resonance processed spectrum, towards the goal of achieving better transferability of magnetic resonance spectroscopy studies.

MeSH
algoritmy * MeSH
automatizované zpracování dat statistika a číselné údaje MeSH
lidé MeSH
magnetická rezonanční spektroskopie metody MeSH
magnetická rezonanční tomografie metody MeSH
počítačové zpracování obrazu metody MeSH
software * MeSH
Check Tag
lidé MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek

Machine learning pipeline with custom grid search for colorectal Raman spectroscopy data

... A dataset of 330 spectra from 155 participants was preprocessed using a standardized pipeline, and multiple ...

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy. 2026 Jan 15 ; 345 () : 126749. [epub] 20250809

Spectrochim Acta A Mol Biomol Spectrosc
ISSN 1873-3557 | 1386-1425
Zdroj

Colorectal cancer remains a major health burden, and its early detection is crucial for effective treatment. This study investigates the use of a handheld Raman spectrometer in combination with machine learning to classify colorectal tissue samples collected during colonoscopy. A dataset of 330 spectra from 155 participants was preprocessed using a standardized pipeline, and multiple classification models were trained to distinguish between healthy and pathological tissue. Due to the strong class imbalance and limited data size, a custom grid search approach was implemented to optimize both model hyperparameters and preprocessing parameters. Unlike standard GridSearchCV, our method prioritized balanced accuracy on the test set to reduce bias toward the dominant class. Among the tested classifiers, the Decision Tree (DT) and Support Vector Classifier (SVC) achieved the highest balanced accuracy (71.77% for DT and 70.77% for SVC), outperforming models trained using traditional methods. These results demonstrate the potential of Raman spectroscopy as a rapid, non-destructive screening tool and highlight the importance of tailored model selection strategies in biomedical applications. While this study is based on a limited dataset, it serves as a promising step toward more robust classification models and supports the feasibility of this approach for future clinical validation.

Klíčová slova
Balanced accuracy, Colorectal cancer, Machine learning, Preprocessing pipeline, Raman spectroscopy, Spectral classification,
MeSH
kolorektální nádory * diagnóza diagnostické zobrazování MeSH
lidé středního věku MeSH
lidé MeSH
Ramanova spektroskopie * metody MeSH
rozhodovací stromy MeSH
strojové učení * MeSH
support vector machine MeSH
Check Tag
lidé středního věku MeSH
lidé MeSH
mužské pohlaví MeSH
ženské pohlaví MeSH
Publikační typ
časopisecké články MeSH

Článek

Snímání a predzpracování dat pri behaviorálních pokusech
[Scanning and preprocessing of data in behavioral studies]

Pribík, V
Autor Pribík, V

Ceskoslovenska fysiologie. 1982 ; 31 (3) : 257-60.

Cesk Fysiol
ISSN 1210-6313
Zdroj

Článek

Corticosteroid treatment prediction using chest X-ray and clinical data

... We have used a novel combination of clinical data, including blood tests, spirometry, and X-ray images ...

Computational and structural biotechnology journal. 2024 Dec ; 24 () : 53-65. [epub] 20231207

Comput Struct Biotechnol J
ISSN 2001-0370
Zdroj

BACKGROUND AND OBJECTIVE: Severe courses of COVID-19 disease can lead to long-term complications. The post-acute phase of COVID-19 refers to the persistent or new symptoms. This problem is becoming more relevant with the increasing number of patients who have contracted COVID-19 and the emergence of new virus variants. In this case, preventive treatment with corticosteroids can be applied. However, not everyone benefits from the treatment, moreover, it can have severe side effects. Currently, no study would analyze who benefits from the treatment. METHODS: This work introduces a novel approach to the recommendation of Corticosteroid (CS) treatment for patients in the post-acute phase. We have used a novel combination of clinical data, including blood tests, spirometry, and X-ray images from 273 patients. These are very challenging to collect, especially from patients in the post-acute phase of COVID-19. To our knowledge, no similar dataset exists in the literature. Moreover, we have proposed a unique methodology that combines machine learning and deep learning models based on Vision Transformer (ViT) and InceptionNet, preprocessing techniques, and pretraining strategies to deal with the specific characteristics of our data. RESULTS: The experiments have proved that combining clinical data with CXR images achieves 8% higher accuracy than independent analysis of CXR images. The proposed method reached 80.0% accuracy (78.7% balanced accuracy) and a ROC-AUC of 0.89. CONCLUSIONS: The introduced system for CS treatment prediction using our neural network and learning algorithm is unique in this field of research. Here, we have shown the efficiency of using mixed data and proved it on real-world data. The paper also introduces the factors that could be used to predict long-term complications. Additionally, this system was deployed to the hospital environment as a recommendation tool, which admits the clinical application of the proposed methodology.

Klíčová slova
Chest X-ray images, Clinical data, Image classification, Post-acute COVID-19, Treatment prediction, Vision transformer,
Publikační typ
časopisecké články MeSH

Publikováno

Filtry

data preprocessing Dotaz Zobrazit nápovědu

Přesná shoda

data preprocessing Dotaz Zobrazit nápovědu Přesná shoda

Upřesnit dle MeSH

data preprocessing Dotaz Zobrazit nápovědu

Přesná shoda