Data alignment automation
Dotaz
Zobrazit nápovědu
Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC × GC-MS) has great potential for analyses of complicated mixtures and sample matrices, due to its separation power and possible high resolution. The second component of the measurement results, the mass spectra, is reproducible. However, the reproducibility of two-dimensional chromatography is affected by many factors and makes the evaluation of long-term experiments or cross-laboratory collaborations complicated. This paper presents a new open-source data alignment tool to tackle the problem of retention time shifts - with 5 different algorithms implemented: BiPACE 2D, DISCO, MSort, PAM, and TNT-DA, along with Pearson's correlation and dot product as optional methods for mass spectra comparison. The implemented data alignment algorithms and their variations were tested on real samples to demonstrate the functionality of the presented tool. The suitability of each implemented algorithm for significantly/non-significantly shifted data was discussed on the basis of the results obtained. For the evaluation of the "goodness" of the alignment, Kolmogorov-Smirnov test values were calculated, and comparison graphs were generated. The DA_2DChrom is available online with its documentation, fully open-sourced, and the user can use the tool without the need of uploading their data to external third-party servers.
- Klíčová slova
- Data alignment algorithm, Data alignment automation, GC × GC–TOF real samples, Human scent,
- Publikační typ
- časopisecké články MeSH
Typical time intervals between acquisitions of three-dimensional (3-D) images of the same cell in live cell imaging are in the orders of minutes. In the meantime, the live cell can move in a water basin on the stage. This movement can hamper the studies of intranuclear processes. We propose a fast point-based image registration method for the suppression of the movement of a cell as a whole in the image data. First, centroids of certain intracellular objects are computed for each image in a time-lapse series. Then, a matching between the centroids, which have the maximal number of pairs, is sought between consecutive point sets by a 3-D extension of a two-dimensional fast point pattern matching method, which is invariant to rotation, translation, local distortion, and extra/missing points. The proposed 3-D extension assumes rotations only around the z axis to retain the complexity of the original method. The final step involves computing the optimal fully 3-D transformation between images from corresponding points in the least-squares manner. The robustness of the method was evaluated on generated data. The results of the simulations show that the method is very precise and its correctness can be estimated. This article also presents two practical application examples, namely the registration of images of HP1 domains and the registration of images of telomeres. More than 97% of time-consecutive images were successfully registered. The results show that the method is very well suited to live cell imaging.
- MeSH
- algoritmy MeSH
- artefakty MeSH
- fluorescenční mikroskopie metody MeSH
- interpretace obrazu počítačem metody MeSH
- kultivované buňky cytologie MeSH
- lidé MeSH
- pohyb buněk MeSH
- reprodukovatelnost výsledků MeSH
- rozpoznávání automatizované metody MeSH
- senzitivita a specificita MeSH
- subtrakční technika MeSH
- ukládání a vyhledávání informací metody MeSH
- umělá inteligence MeSH
- videomikroskopie metody MeSH
- vylepšení obrazu metody MeSH
- zobrazování trojrozměrné metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- hodnotící studie MeSH
- práce podpořená grantem MeSH
We present the ROCA (ROad Curvature Analyst) software, in the form of an ESRI ArcGIS Toolbox, intended for vector line data processing. The software segments road network data into tangents and horizontal curves. Horizontal curve radii and azimuth of tangents are then automatically computed. Simultaneously, additional frequently used road section characteristics are calculated, such as the sinuosity of a road section (detour ratio), the number of turns along an individual road section and the average cumulative angle for a road section. The identification of curves is based on the naïve Bayes classifier and users are allowed to prepare their own training data files. We applied ROCA software to secondary roads within the Czech road network (9,980 km). The data processing took less than ten minutes. Approximately 43% of the road network in question consists of 42,752 horizontal curves. The ROCA software outperforms other existing automatic methods by 26% with respect to the percentage of correctly identified curves. The segmented secondary roads within the Czech road network can be viewed on the roca.cdvgis.cz/czechia web-map application. We combined data on road geometry with road crashes database to develop the crash modification factors for horizontal curves with various radii. We determined that horizontal curves with radii of 50 m are approximately 3.7 times more hazardous than horizontal curves with radii accounting for 1000 m. ROCA software can be freely downloaded for noncommercial use from https://roca.cdvinfo.cz/ website.
- MeSH
- automobily normy MeSH
- bezpečnost MeSH
- dopravní nehody prevence a kontrola MeSH
- geografické informační systémy MeSH
- lidé MeSH
- pomůcky pro sebeobsluhu * MeSH
- řízení motorových vozidel * normy MeSH
- rotace * MeSH
- rozpoznávání automatizované metody MeSH
- software * normy MeSH
- životní prostředí - projekt MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Existing multichannel blind restoration techniques assume perfect spatial alignment of channels, correct estimation of blur size, and are prone to noise. We developed an alternating minimization scheme based on a maximum a posteriori estimation with a priori distribution of blurs derived from the multichannel framework and a priori distribution of original images defined by the variational integral. This stochastic approach enables us to recover the blurs and the original image from channels severely corrupted by noise. We observe that the exact knowledge of the blur size is not necessary, and we prove that translation misregistration up to a certain extent can be automatically removed in the restoration process.
- MeSH
- algoritmy * MeSH
- interpretace obrazu počítačem metody MeSH
- počítačové zpracování signálu MeSH
- rozpoznávání automatizované metody MeSH
- subtrakční technika * MeSH
- ukládání a vyhledávání informací metody MeSH
- umělá inteligence * MeSH
- vylepšení obrazu metody MeSH
- Publikační typ
- časopisecké články MeSH
- hodnotící studie MeSH
- práce podpořená grantem MeSH
BACKGROUND: Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring. RESULTS: We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design and subsequent steps of wet-lab verification. CONCLUSIONS: Our pipeline presents a simple and freely accessible software tool for identification of sex chromosome linked genes in species without an existing reference genome. Based on combination of genetic crosses and RNA-Seq data, we have designed a high-throughput, cost-effective approach for a broad community of scientists focused on sex chromosome structure and evolution.
- MeSH
- genetické markery genetika MeSH
- genom lidský MeSH
- geny vázané na chromozom X * MeSH
- geny vázané na chromozom Y * MeSH
- jednonukleotidový polymorfismus genetika MeSH
- lidé MeSH
- polymerázová řetězová reakce MeSH
- RNA genetika MeSH
- sekvenční analýza RNA metody MeSH
- software * MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Check Tag
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- genetické markery MeSH
- RNA MeSH
For decades, there has been scientific interest in the variation and geographic distribution of paternal lineages associated with the human Y chromosome. However, the relevant data have been dispersed across numerous publications, making it difficult to consolidate. Additionally, understanding the relationships between different variants, and the tools used to analyze them, have evolved over time, further complicating efforts to harmonize this information. The Universal Y-SNP Database (UYSD) marks a substantial advancement by providing a comprehensive and accessible platform for Y-SNP and haplogroup data from populations around the world. UYSD harmonizes diverse datasets into a unified repository, facilitating the exploration of global Y-chromosomal variation. The platform handles data generated with both high- and low-throughput technology and is compatible with the automated analysis software tool, Yleaf v3. Key functionalities include the ability to: i) visualize haplogroup distributions on an interactive world map, ii) estimate haplogroup frequencies in geographic regions with sparse data through interpolation, and iii) display detailed phylogenetic trees of Y-chromosomal haplogroups. Currently, UYSD encompasses data from over 6,600 males across 27 populations. This dataset largely aligns with known global Y-haplogroup patterns, but also reveals unexplored finer-scale geographic variations. While the present dataset is largely European-centered, UYSD is designed for ongoing expansion by the scientific community, aiming to include more global data and higher-resolution population sequencing data. The platform thus offers valuable insights into human genetic diversity and migration patterns, serving several fields of research such as: human population genetics, genetic anthropology, ancient DNA analysis and forensic genetics.
In this paper, we examine several methods of acquiring Czech data for automated fact-checking, which is a task commonly modeled as a classification of textual claim veracity w.r.t. a corpus of trusted ground truths. We attempt to collect sets of data in form of a factual claim, evidence within the ground truth corpus, and its veracity label (supported, refuted or not enough info). As a first attempt, we generate a Czech version of the large-scale FEVER dataset built on top of Wikipedia corpus. We take a hybrid approach of machine translation and document alignment; the approach and the tools we provide can be easily applied to other languages. We discuss its weaknesses, propose a future strategy for their mitigation and publish the 127k resulting translations, as well as a version of such dataset reliably applicable for the Natural Language Inference task-the CsFEVER-NLI. Furthermore, we collect a novel dataset of 3,097 claims, which is annotated using the corpus of 2.2 M articles of Czech News Agency. We present an extended dataset annotation methodology based on the FEVER approach, and, as the underlying corpus is proprietary, we also publish a standalone version of the dataset for the task of Natural Language Inference we call CTKFactsNLI. We analyze both acquired datasets for spurious cues-annotation patterns leading to model overfitting. CTKFacts is further examined for inter-annotator agreement, thoroughly cleaned, and a typology of common annotator errors is extracted. Finally, we provide baseline models for all stages of the fact-checking pipeline and publish the NLI datasets, as well as our annotation platform and other experimental data.
- Klíčová slova
- Automated fact-checking, Czech, Document retrieval, FEVER, Natural language inference,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Deep Brain Stimulation (DBS), applying chronic electrical stimulation of subcortical structures, is a clinical intervention applied in major neurologic disorders. In order to achieve a good clinical effect, accurate electrode placement is necessary. The primary localisation is typically based on presurgical MRI imaging, often followed by intra-operative electrophysiology recording to increase the accuracy and to compensate for brain shift, especially in cases where the surgical target is small, and there is low contrast: e.g., in Parkinson's disease (PD) and in its common target, the subthalamic nucleus (STN). METHODS: We propose a novel, fully automatic method for intra-operative surgical navigation. First, the surgical target is segmented in presurgical MRI images using a statistical shape-intensity model. Next, automated alignment with intra-operatively recorded microelectrode recordings is performed using a probabilistic model of STN electrophysiology. We apply the method to a dataset of 120 PD patients with clinical T2 1.5T images, of which 48 also had available microelectrode recordings (MER). RESULTS: The proposed segmentation method achieved STN segmentation accuracy around dice = 0.60 compared to manual segmentation. This is comparable to the state-of-the-art on low-resolution clinical MRI data. When combined with electrophysiology-based alignment, we achieved an accuracy of 0.85 for correctly including recording sites of STN-labelled MERs in the final STN volume. CONCLUSION: The proposed method combines image-based segmentation of the subthalamic nucleus with microelectrode recordings to estimate their mutual location during the surgery in a fully automated process. Apart from its potential use in clinical targeting, the method can be used to map electrophysiological properties to specific parts of the basal ganglia structures and their vicinity.
- MeSH
- elektrofyziologie MeSH
- hluboká mozková stimulace * metody MeSH
- lidé MeSH
- magnetická rezonanční tomografie MeSH
- mikroelektrody MeSH
- Parkinsonova nemoc * terapie chirurgie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Large-scale comparative studies of DNA fingerprints prefer automated chip capillary electrophoresis over conventional gel planar electrophoresis due to the higher precision of the digitalization process. However, the determination of band sizes is still limited by the device resolution and sizing accuracy. Band matching, therefore, remains the key step in DNA fingerprint analysis. Most current methods evaluate only the pairwise similarity of the samples, using heuristically determined constant thresholds to evaluate the maximum allowed band size deviation; unfortunately, that approach significantly reduces the ability to distinguish between closely related samples. This study presents a new approach based on global multiple alignments of bands of all samples, with an adaptive threshold derived from the detailed migration analysis of a large number of real samples. The proposed approach allows the accurate automated analysis of DNA fingerprint similarities for extensive epidemiological studies of bacterial strains, thereby helping to prevent the spread of dangerous microbial infections.
- Klíčová slova
- Automated chip capillary electrophoresis, Band matching, DBSCAN, density-based spatial clustering of applications with noise, DNA fingerprinting, DTW, dynamic time warping, ESBL, extended spectrum beta-lactamases, Gel sample distortion, Genotyping, KLPN, Klebsiella pneumonia, MALDI-TOF, matrix assisted laser desorption ionization – time of flight, Pattern recognition, R-square, ratio of the sum of squares, RMSE, root mean squared error, SD, standard deviation, SLINK, single linkage, SSE, sum of squares due to error, UPGMA, unweighted pair group method with arithmetic mean, rep-PCR, repetitive element palindromic polymerase chain reaction,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Slow-fast analysis is a simple and effective method to reduce the influence of substitution saturation, one of the causes of phylogenetic noise and long branch attraction (LBA) artifacts. In several steps of increasing stringency, the slow-fast analysis omits the fastest substituting alignment positions from the analysed dataset and thus increases its signal/noise ratio. RESULTS: Our program SlowFaster automates the process of assessing the substitution rate of the alignment positions and the process of producing new alignments by deleting the saturated positions. Its use is very simple. It goes through the whole process in several steps: data input - necessary choices - production of new alignments. CONCLUSION: SlowFaster is a user-friendly tool providing new alignments prepared with slow-fast analysis. These data can be used for further phylogenetic analyses with lower risk of long branch attraction artifacts.
- MeSH
- Blastocystis genetika MeSH
- časové faktory MeSH
- fylogeneze * MeSH
- protozoální DNA MeSH
- sekvenční seřazení MeSH
- uživatelské rozhraní počítače * MeSH
- výpočetní biologie metody MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- protozoální DNA MeSH