annotation data
Dotaz
Zobrazit nápovědu
BACKGROUND: Developmental coordination disorder (DCD) is described as a motor skill disorder characterized by a marked impairment in the development of motor coordination abilities that significantly interferes with performance of daily activities and/or academic achievement. Since some electrophysiological studies suggest differences between children with/without motor development problems, we prepared an experimental protocol and performed electrophysiological experiments with the aim of making a step toward a possible diagnosis of this disorder using the event-related potentials (ERP) technique. The second aim is to properly annotate the obtained raw data with relevant metadata and promote their long-term sustainability. RESULTS: The data from 32 school children (16 with possible DCD and 16 in the control group) were collected. Each dataset contains raw electroencephalography (EEG) data in the BrainVision format and provides sufficient metadata (such as age, gender, results of the motor test, and hearing thresholds) to allow other researchers to perform analysis. For each experiment, the percentage of ERP trials damaged by blinking artifacts was estimated. Furthermore, ERP trials were averaged across different participants and conditions, and the resulting plots are included in the manuscript. This should help researchers to estimate the usability of individual datasets for analysis. CONCLUSIONS: The aim of the whole project is to find out if it is possible to make any conclusions about DCD from EEG data obtained. For the purpose of further analysis, the data were collected and annotated respecting the current outcomes of the International Neuroinformatics Coordinating Facility Program on Standards for Data Sharing, the Task Force on Electrophysiology, and the group developing the Ontology for Experimental Neurophysiology. The data with metadata are stored in the EEG/ERP Portal.
- Klíčová slova
- developmental coordination disorder, electroencephalography, event-related potentials, reaction time, visual and audio stimulation,
- MeSH
- akustická stimulace MeSH
- datové kurátorství MeSH
- dítě MeSH
- elektroencefalografie MeSH
- evokované potenciály MeSH
- komorbidita MeSH
- kvantitativní znak dědičný MeSH
- lidé MeSH
- počítačová simulace MeSH
- poruchy motorických dovedností diagnóza MeSH
- reakční čas MeSH
- reprodukovatelnost výsledků MeSH
- software MeSH
- světelná stimulace MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
One of the biggest challenges of training deep neural network is the need for massive data annotation. To train the neural network for object detection, millions of annotated training images are required. However, currently, there are no large-scale thermal image datasets that could be used to train the state of the art neural networks, while voluminous RGB image datasets are available. This paper presents a method that allows to create hundreds of thousands of annotated thermal images using the RGB pre-trained object detector. A dataset created in this way can be used to train object detectors with improved performance. The main gain of this work is the novel method for fully automatic thermal image labeling. The proposed system uses the RGB camera, thermal camera, 3D LiDAR, and the pre-trained neural network that detects objects in the RGB domain. Using this setup, it is possible to run the fully automated process that annotates the thermal images and creates the automatically annotated thermal training dataset. As the result, we created a dataset containing hundreds of thousands of annotated objects. This approach allows to train deep learning models with similar performance as the common human-annotation-based methods do. This paper also proposes several improvements to fine-tune the results with minimal human intervention. Finally, the evaluation of the proposed solution shows that the method gives significantly better results than training the neural network with standard small-scale hand-annotated thermal image datasets.
- Klíčová slova
- IR, RGB, YOLO, data annotation, deep convolutional neural networks, object detector, thermal, transfer learning,
- Publikační typ
- časopisecké články MeSH
Circular RNAs play a crucial role in cell development and serve as biomarkers in many diseases. Nevertheless, the function of many circular RNAs remains unknown. This function can be inferred from sponging and silencing interactions with micro RNAs and messenger RNAs. We recently proposed a network-based circRNA functional annotation tool, circGPA. However, validation data for RNA interactions are often sparse and predicted interactions contain many false positives. To address this issue, we propose an extended algorithm named circGPAcorr, which uses expression data to weight the interactions, resulting in more precise functional annotation. To assess the significance of the results, the p-value is calculated using reduction to circGPA, a generating-polynomial-based method. We show that the problem is #P-hard, and thus computationally difficult. The circGPAcorr algorithm is tested on publicly available myelodysplastic syndromes expression data, providing gene ontology annotations that align with the literature on myelodysplastic syndromes. At the same time, we demonstrate its performance in the circRNA-disease annotation task.
- Klíčová slova
- CircRNA, Functional annotation, Gene expression, Generating polynomial,
- Publikační typ
- časopisecké články MeSH
The biological role of oxidized glycerophosphocholines (oxPCs) is a current topic of research importantly contributing to the understanding of health and disease. Global non-targeted metabolomics offers an interesting approach to expand current knowledge and link oxPCs to new biological functions. Although this strategy is successful, it also has some limitations which are clearly noticeable during the identification process. For this reason, clear rules related to the identification of each group of metabolites are needed. This work attempts to provide the reader with a guideline for the recognition and annotation of oxidation among phosphocholines (PCs). Using several chromatographic characteristics and spectral information from tandem mass spectrometry, rapid and reliable annotation of long and short chain oxPCs can be performed. Some of this knowledge has been implemented in the publicly available annotation tool 'CEU Mass Mediator' (CMM) for semi-automated assignment of oxidation. Additionally, this tool was supplemented with accurate monoisotopic masses of oxPCs, expanding current information in other databases. Moreover, the characterization of oxidization products of PC(16:0/20:4) known as PAPC has been performed, providing a list of accurate mass product ions and neutral losses.
- Klíčová slova
- Annotation, Identification, LC-ESI-QTOF, Non-targeted metabolomics, Oxidized glycerophosphocholines, oxPAPC,
- MeSH
- chromatografie kapalinová MeSH
- databáze faktografické MeSH
- diabetes mellitus 2. typu krev diagnóza metabolismus MeSH
- fosfatidylcholiny krev chemie metabolismus MeSH
- hmotnostní spektrometrie MeSH
- lidé MeSH
- metabolomika * MeSH
- molekulární struktura MeSH
- oxidace-redukce MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- fosfatidylcholiny MeSH
The diagnosis of Sleep disorders, highly prevalent in the western countries, typically involves sophisticated procedures and equipments that are intrusive to the patient. Wrist actigraphy, on the contrary, is a non-invasive and low cost solution to gather data which can provide valuable information in the diagnosis of these disorders. The acquired data may be used to infer the Sleep/Wakefulness (SW) state of the patient during the circadian cycle and detect abnormal behavioral patterns associated with these disorders. In this paper a classifier based on Autoregressive (AR) model coefficients, among other features, is proposed to estimate the SW state. The real data, acquired from 23 healthy subjects during fourteen days each, was segmented by expert medical personal with the help of complementary information such as light intensity and Sleep e-Diary information. Monte Carlo tests with a Leave-One-Out Cross Validation (LOOCV) strategy were used to assess the performance of the classifier which achieves an accuracy of 96%.
- MeSH
- aktigrafie metody MeSH
- automatizace MeSH
- automatizované zpracování dat metody MeSH
- Bayesova věta MeSH
- bdění fyziologie MeSH
- lidé MeSH
- poruchy spánku a bdění diagnóza patofyziologie MeSH
- reprodukovatelnost výsledků MeSH
- spánek fyziologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Accurate annotation of genomic variants in human diseases is essential to allow personalized medicine. Assessment of somatic and germline TP53 alterations has now reached the clinic and is required in several circumstances such as the identification of the most effective cancer therapy for patients with chronic lymphocytic leukemia (CLL). Here, we present Seshat, a Web service for annotating TP53 information derived from sequencing data. A flexible framework allows the use of standard file formats such as Mutation Annotation Format (MAF) or Variant Call Format (VCF), as well as common TXT files. Seshat performs accurate variant annotations using the Human Genome Variation Society (HGVS) nomenclature and the stable TP53 genomic reference provided by the Locus Reference Genomic (LRG). In addition, using the 2017 release of the UMD_TP53 database, Seshat provides multiple statistical information for each TP53 variant including database frequency, functional activity, or pathogenicity. The information is delivered in standardized output tables that minimize errors and facilitate comparison of mutational data across studies. Seshat is a beneficial tool to interpret the ever-growing TP53 sequencing data generated by multiple sequencing platforms and it is freely available via the TP53 Website, http://p53.fr or directly at http://vps338341.ovh.net/.
- Klíčová slova
- HGVS variant nomenclature, TP53 variants, database, variant annotation,
- MeSH
- anotace sekvence MeSH
- databáze genetické * MeSH
- genetická variace genetika MeSH
- genomika trendy MeSH
- internet MeSH
- lidé MeSH
- mutace MeSH
- nádorový supresorový protein p53 genetika MeSH
- software * MeSH
- výpočetní biologie trendy MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- nádorový supresorový protein p53 MeSH
- TP53 protein, human MeSH Prohlížeč
Recent technological advances have made next-generation sequencing (NGS) a popular and financially accessible technique allowing a broad range of analyses to be done simultaneously. A huge amount of newly generated NGS data, however, require advanced software support to help both in analyzing the data and biologically interpreting the results. In this article, we describe SATrans (Software for Annotation of Transcriptome), a software package providing fast and robust functional annotation of novel sequences obtained from transcriptome sequencing. Moreover, it performs advanced gene ontology analysis of differentially expressed genes, thereby helping to interpret biologically-and in a user-friendly form-the quantitative changes in gene expression. The software is freely available and provides the possibility to work with thousands of sequences using a standard personal computer or notebook running on the Linux operating system.
- Klíčová slova
- differentially expressed genes, functional annotation, transcriptome,
- MeSH
- anotace sekvence metody MeSH
- lidé MeSH
- sekvenční analýza RNA metody MeSH
- software * MeSH
- stanovení celkové genové exprese metody MeSH
- transkriptom * MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
BACKGROUND: RNA-seq followed by de novo transcriptome assembly has been a transformative technique in biological research of non-model organisms, but the computational processing of RNA-seq data entails many different software tools. The complexity of these de novo transcriptomics workflows therefore presents a major barrier for researchers to adopt best-practice methods and up-to-date versions of software. RESULTS: Here we present a streamlined and universal de novo transcriptome assembly and annotation pipeline, transXpress, implemented in Snakemake. transXpress supports two popular assembly programs, Trinity and rnaSPAdes, and allows parallel execution on heterogeneous cluster computing hardware. CONCLUSIONS: transXpress simplifies the use of best-practice methods and up-to-date software for de novo transcriptome assembly, and produces standardized output files that can be mined using SequenceServer to facilitate rapid discovery of new genes and proteins in non-model organisms.
- Klíčová slova
- De novo transcriptome assembly, Differential expression analysis, High-performance computing, Non-model organisms, RNA-seq, Reproducible software, Transcriptome annotation,
- MeSH
- anotace sekvence MeSH
- sekvenční analýza RNA metody MeSH
- sekvenování transkriptomu MeSH
- software * MeSH
- stanovení celkové genové exprese MeSH
- transkriptom * MeSH
- Publikační typ
- časopisecké články MeSH
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.
- MeSH
- anotace sekvence * MeSH
- databáze proteinů * MeSH
- datové soubory jako téma MeSH
- DNA genetika metabolismus MeSH
- genová ontologie MeSH
- internet MeSH
- lidé MeSH
- RNA genetika metabolismus MeSH
- sekvence aminokyselin MeSH
- software * MeSH
- vazba proteinů MeSH
- vnitřně neuspořádané proteiny chemie genetika metabolismus MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
- Názvy látek
- DNA MeSH
- RNA MeSH
- vnitřně neuspořádané proteiny MeSH