JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Datasets Dotaz Zobrazit nápovědu

Reset

1 804 záznamů v PubMed

Článek

Point cloud registration from local feature correspondences-Evaluation on challenging datasets

registration and evaluate the proposed method and its individual components on challenging real-world datasets ...

Petricek, Tomas
Autor Petricek, Tomas ORCID Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Svoboda, Tomas
Autor Svoboda, Tomas Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic

PloS one. 2017 ; 12 (11) : e0187943. [epub] 20171114

PLoS One
ISSN 1932-6203
Zdroj

Registration of laser scans, or point clouds in general, is a crucial step of localization and mapping with mobile robots or in object modeling pipelines. A coarse alignment of the point clouds is generally needed before applying local methods such as the Iterative Closest Point (ICP) algorithm. We propose a feature-based approach to point cloud registration and evaluate the proposed method and its individual components on challenging real-world datasets. For a moderate overlap between the laser scans, the method provides a superior registration accuracy compared to state-of-the-art methods including Generalized ICP, 3D Normal-Distribution Transform, Fast Point-Feature Histograms, and 4-Points Congruent Sets. Compared to the surface normals, the points as the underlying features yield higher performance in both keypoint detection and establishing local reference frames. Moreover, sign disambiguation of the basis vectors proves to be an important aspect in creating repeatable local reference frames. A novel method for sign disambiguation is proposed which yields highly repeatable reference frames.

Článek

IDSM ChemWebRDF: SPARQLing small-molecule datasets

In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. ...

Journal of cheminformatics. 2021 May 12 ; 13 (1) : 38. [epub] 20210512

J Cheminform
ISSN 1758-2946
Zdroj

The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/ .

Klíčová slova
Resource Descriptor Framework, SPARQL, Small-molecule datasets,
Publikační typ
časopisecké články MeSH

Článek

Reductionist Pathways for Parasitism in Euglenozoans? Expanded Datasets Provide New Insights

Trends in parasitology. 2021 Feb ; 37 (2) : 100-116. [epub] 20201027

Trends Parasitol
ISSN 1471-5007 | 1471-4922
Zdroj

The unicellular trypanosomatids belong to the phylum Euglenozoa and all known species are obligate parasites. Distinct lineages infect plants, invertebrates, and vertebrates, including humans. Genome data for marine diplonemids, together with freshwater euglenids and free-living kinetoplastids, the closest known nonparasitic relatives to trypanosomatids, recently became available. Robust phylogenetic reconstructions across Euglenozoa are now possible and place the results of parasite-focused studies into an evolutionary context. Here we discuss recent advances in identifying the factors shaping the evolution of Euglenozoa, focusing on ancestral features generally considered parasite-specific. Remarkably, most of these predate the transition(s) to parasitism, suggesting that the presence of certain preconditions makes a significant lifestyle change more likely.

Klíčová slova
Euglenozoa, diplonemids, evolution, kinetoplastids, metabolism, parasitism,
MeSH
biologická evoluce * MeSH
datové soubory jako téma MeSH
Euglenozoa klasifikace genetika MeSH
fylogeneze MeSH
genom genetika MeSH
infekce prvoky kmene Euglenozoa parazitologie MeSH
lidé MeSH
paraziti klasifikace genetika MeSH
zvířata MeSH
Check Tag
lidé MeSH
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH
přehledy MeSH

Článek

European aerosol phenomenology - 8: Harmonised source apportionment of organic aerosol using 22 Year-long ACSM/AMS datasets

Europe has a well-established air quality research infrastructure from which yearlong datasets using ...

Environment international. 2022 Aug ; 166 () : 107325. [epub] 20220530

Environ Int
ISSN 1873-6750 | 0160-4120
Zdroj

Organic aerosol (OA) is a key component of total submicron particulate matter (PM1), and comprehensive knowledge of OA sources across Europe is crucial to mitigate PM1 levels. Europe has a well-established air quality research infrastructure from which yearlong datasets using 21 aerosol chemical speciation monitors (ACSMs) and 1 aerosol mass spectrometer (AMS) were gathered during 2013-2019. It includes 9 non-urban and 13 urban sites. This study developed a state-of-the-art source apportionment protocol to analyse long-term OA mass spectrum data by applying the most advanced source apportionment strategies (i.e., rolling PMF, ME-2, and bootstrap). This harmonised protocol was followed strictly for all 22 datasets, making the source apportionment results more comparable. In addition, it enables quantification of the most common OA components such as hydrocarbon-like OA (HOA), biomass burning OA (BBOA), cooking-like OA (COA), more oxidised-oxygenated OA (MO-OOA), and less oxidised-oxygenated OA (LO-OOA). Other components such as coal combustion OA (CCOA), solid fuel OA (SFOA: mainly mixture of coal and peat combustion), cigarette smoke OA (CSOA), sea salt (mostly inorganic but part of the OA mass spectrum), coffee OA, and ship industry OA could also be separated at a few specific sites. Oxygenated OA (OOA) components make up most of the submicron OA mass (average = 71.1%, range from 43.7 to 100%). Solid fuel combustion-related OA components (i.e., BBOA, CCOA, and SFOA) are still considerable with in total 16.0% yearly contribution to the OA, yet mainly during winter months (21.4%). Overall, this comprehensive protocol works effectively across all sites governed by different sources and generates robust and consistent source apportionment results. Our work presents a comprehensive overview of OA sources in Europe with a unique combination of high time resolution (30-240 min) and long-term data coverage (9-36 months), providing essential information to improve/validate air quality, health impact, and climate models.

Klíčová slova
European Overview, Harmonised Protocol, Long-term Datasets, Organic Aerosol, Rolling PMF, Source apportionment,
Publikační typ
časopisecké články MeSH

Článek

Typicality of functional connectivity robustly captures motion artifacts in rs-fMRI across datasets, atlases, and preprocessing pipelines

In a resting-state fMRI dataset of 245 healthy subjects, this measure was significantly correlated with ...

Human brain mapping. 2020 Dec 15 ; 41 (18) : 5325-5340. [epub] 20200902

Hum Brain Mapp
ISSN 1097-0193 | 1065-9471
Zdroj

Functional connectivity analysis of resting-state fMRI data has recently become one of the most common approaches to characterizing individual brain function. It has been widely suggested that the functional connectivity matrix is a useful approximate representation of the brain's connectivity, potentially providing behaviorally or clinically relevant markers. However, functional connectivity estimates are known to be detrimentally affected by various artifacts, including those due to in-scanner head motion. Moreover, as individual functional connections generally covary only very weakly with head motion estimates, motion influence is difficult to quantify robustly, and prone to be neglected in practice. Although the use of individual estimates of head motion, or group-level correlation of motion and functional connectivity has been suggested, a sufficiently sensitive measure of individual functional connectivity quality has not yet been established. We propose a new intuitive summary index, Typicality of Functional Connectivity, to capture deviations from standard brain functional connectivity patterns. In a resting-state fMRI dataset of 245 healthy subjects, this measure was significantly correlated with individual head motion metrics. The results were further robustly reproduced across atlas granularity, preprocessing options, and other datasets, including 1,081 subjects from the Human Connectome Project. In principle, Typicality of Functional Connectivity should be sensitive also to other types of artifacts, processing errors, and possibly also brain pathology, allowing extensive use in data quality screening and quantification in functional connectivity studies as well as methodological investigations.

Klíčová slova
atlas, functional connectivity, motion, quality, rs-fMRI,
MeSH
artefakty MeSH
atlasy jako téma * MeSH
datové soubory jako téma * MeSH
dospělí MeSH
hlava - pohyby MeSH
konektom * metody normy MeSH
lidé MeSH
magnetická rezonanční tomografie * metody normy MeSH
mladý dospělý MeSH
mozek diagnostické zobrazování fyziologie MeSH
počítačové zpracování obrazu * metody normy MeSH
Check Tag
dospělí MeSH
lidé MeSH
mladý dospělý MeSH
mužské pohlaví MeSH
ženské pohlaví MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek

Working with benchmark datasets in the Cuby framework

More and larger benchmark datasets are becoming available, and working efficiently with them is a necessity ...

The Journal of chemical physics. 2024 May 28 ; 160 (20) : .

J Chem Phys
ISSN 1089-7690 | 0021-9606
Zdroj

The development and benchmarking of computational chemistry methods rely on comparison with benchmark data. More and larger benchmark datasets are becoming available, and working efficiently with them is a necessity. The Cuby framework provides rich functionality for working with datasets, comes with many ready-to-use predefined benchmark sets, and interfaces with a wide range of computational chemistry software packages. Here, we review the tools Cuby provides for working with datasets and provide examples of more advanced workflows, such as handling large numbers of computations on high performance computing resources and reusing previously computed data. Cuby has also been extended recently to include two important benchmark databases, NCIAtlas and GMTKN55.

Publikační typ
časopisecké články MeSH

Článek

Collection of datasets with DNS over HTTPS traffic

For comprehensive evaluation and comparison of developed classifiers, real-world datasets are needed, ...

Data in brief. 2022 Jun ; 42 () : 108310. [epub] 20220527

Data Brief
ISSN 2352-3409
Zdroj

Recently, the Internet has adopted the DNS over HTTPS (DoH) resolution mechanism for privacy-aware network applications. As DoH becomes more disseminated, it has also become a network monitoring research topic. For comprehensive evaluation and comparison of developed classifiers, real-world datasets are needed, motivating this contribution. We created a new large-scale collection of datasets consisting of two classes of traffic: i) DoH HTTPS communication and ii) non-DoH HTTPS connections. The DoH traffic is captured for multiple DoH providers and clients to include nuances of various DoH implementations and configurations. The non-DoH HTTPS connections complement the DoH communication aiming to include a wide range of existing network applications. The dataset collection consists of network traffic generated in a controlled environment and traffic captured from a real ISP network. The resulting datasets thus provide real-world network traffic data suitable for evaluating existing classifiers and the development of new methods.

Klíčová slova
Computer, DNS, DNS over HTTPS, HTTPS, Monitoring, Network, Network traffic,
Publikační typ
časopisecké články MeSH

Článek

Czech political candidate and donation datasets

This paper introduces a new Czech Political Candidate Dataset (CPCD), which compiles comprehensive data ...

Scientific data. 2025 Feb 19 ; 12 (1) : 302. [epub] 20250219

Sci Data
ISSN 2052-4463
Zdroj

This paper introduces a new Czech Political Candidate Dataset (CPCD), which compiles comprehensive data on all candidates who have run in any municipal, regional, national, and/or European Parliament election in the Czech Republic since 1993. For each candidate, the CPCD includes their first name, last name, age, gender, place of residence, university degree, party membership, party affiliation, ballot position, and election results for candidates and for parties. We match candidates over various elections by using algorithms that rely on their personal information. We add information on political donations made to political parties. We source donation information from the Czech Political Donation Dataset (CPDD), our other newly built dataset, in which we compile records of individual donations to 12 leading political parties from official records for the period from 2017 to 2023. CPDD is publicly available along with the CPCD.

MeSH
lidé MeSH
politika * MeSH
Check Tag
lidé MeSH
Publikační typ
časopisecké články MeSH
dataset MeSH
Geografické názvy
Česká republika MeSH

Článek

Create, Analyze, and Visualize Phylogenomic Datasets Using PhyloFisher

primarily in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets ...

Current protocols. 2024 Jan ; 4 (1) : e969.

Curr Protoc
ISSN 2691-1299
Zdroj

PhyloFisher is a software package written primarily in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets that consist of protein sequences from eukaryotic organisms. Unlike many existing phylogenomic pipelines, PhyloFisher comes with a manually curated database of 240 protein-coding genes, a subset of a previous phylogenetic dataset sampled from 304 eukaryotic taxa. The software package can also utilize a user-created database of eukaryotic proteins, which may be more appropriate for shallow evolutionary questions. PhyloFisher is also equipped with a set of utilities to aid in running routine analyses, such as the prediction of alternative genetic codes, removal of genes and/or taxa based on occupancy/completeness of the dataset, testing for amino acid compositional heterogeneity among sequences, removal of heterotachious and/or fast-evolving sites, removal of fast-evolving taxa, supermatrix creation from randomly resampled genes, and supermatrix creation from nucleotide sequences. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Constructing a phylogenomic dataset Basic Protocol 2: Performing phylogenomic analyses Support Protocol 1: Installing PhyloFisher Support Protocol 2: Creating a custom phylogenomic database.

Klíčová slova
evolution, genomics, systematics, transcriptomics,
MeSH
aminokyseliny * MeSH
biologická evoluce * MeSH
fylogeneze MeSH
kultura MeSH
sekvence aminokyselin MeSH
Publikační typ
časopisecké články MeSH
Názvy látek
aminokyseliny * MeSH

Článek

Genomic benchmarks: a collection of datasets for genomic sequence classification

RESULTS: Here we present a collection of curated and easily accessible sequence classification datasets ...

BMC genomic data. 2023 May 01 ; 24 (1) : 25. [epub] 20230501

BMC Genom Data
ISSN 2730-6844
Zdroj

BACKGROUND: Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition. RESULTS: Here we present a collection of curated and easily accessible sequence classification datasets in the field of genomics. The proposed collection is based on a combination of novel datasets constructed from the mining of publicly available databases and existing datasets obtained from published articles. The collection currently contains nine datasets that focus on regulatory elements (promoters, enhancers, open chromatin region) from three model organisms: human, mouse, and roundworm. A simple convolution neural network is also included in a repository and can be used as a baseline model. Benchmarks and the baseline model are distributed as the Python package 'genomic-benchmarks', and the code is available at https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks . CONCLUSIONS: Deep learning techniques revolutionized many biological fields but mainly thanks to the carefully curated benchmarks. For the field of Genomics, we propose a collection of benchmark datasets for the classification of genomic sequences with an interface for the most commonly used deep learning libraries, implementation of the simple neural network and a training framework that can be used as a starting point for future research. The main aim of this effort is to create a repository for shared datasets that will make machine learning for genomics more comparable and reproducible while reducing the overhead of researchers who want to enter the field, leading to healthy competition and new discoveries.

Klíčová slova
Benchmark, Convolutional neural network, Dataset, Deep learning, Genomics,
MeSH
benchmarking * MeSH
chromatin MeSH
genomika metody MeSH
lidé MeSH
myši MeSH
neuronové sítě * MeSH
strojové učení MeSH
zvířata MeSH
Check Tag
lidé MeSH
myši MeSH
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH
Názvy látek
chromatin MeSH

Publikováno

Filtry

Datasets Dotaz Zobrazit nápovědu

Datasets Dotaz Zobrazit nápovědu

Upřesnit dle MeSH