We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.
- MeSH
- databáze proteinů MeSH
- lidé MeSH
- proteiny * chemie MeSH
- strojové učení * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
Well-documented sleep datasets from healthy adults are important for sleep pattern analysis and comparison with a wide range of neuropsychiatric disorders. Currently, available sleep datasets from healthy adults are acquired using low-density arrays with a minimum of four electrodes in a typical sleep montage. The low spatial resolution is thus prohibitive for the analysis of the spatial structure of sleep. Here we introduce an open-access sleep dataset from 29 healthy adults (13 female, aged 32.17 ± 6.30 years) acquired at the Montreal Neurological Institute. The dataset includes overnight polysomnograms with high-density scalp electroencephalograms incorporating 83 electrodes, electrocardiogram, electromyogram, electrooculogram, and an average of electrode positions using manual co-registrations and sleep scoring annotations. Data characteristics and group-level analysis of sleep properties were assessed. The database can be accessed through ( https://doi.org/10.17605/OSF.IO/R26FH ). This is the first high-density electroencephalogram open sleep database from healthy adults, allowing researchers to investigate sleep physiology at high spatial resolution. We expect that this database will serve as a valuable resource for studying sleep physiology and for benchmarking sleep pathology.
- MeSH
- databáze faktografické MeSH
- dospělí MeSH
- elektroencefalografie * MeSH
- lidé MeSH
- polysomnografie * MeSH
- skalp * MeSH
- spánek * MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
Retinopathy of prematurity (ROP) represents a vasoproliferative disease, especially in newborns and infants, which can potentially affect and damage the vision. Despite recent advances in neonatal care and medical guidelines, ROP still remains one of the leading causes of worldwide childhood blindness. The paper presents a unique dataset of 6,004 retinal images of 188 newborns, most of whom are premature infants. The dataset is accompanied by the anonymized patients' information from the ROP screening acquired at the University Hospital Ostrava, Czech Republic. Three digital retinal imaging camera systems are used in the study: Clarity RetCam 3, Natus RetCam Envision, and Phoenix ICON. The study is enriched by the software tool ReLeSeT which is aimed at automatic retinal lesion segmentation and extraction from retinal images. Consequently, this tool enables computing geometric and intensity features of retinal lesions. Also, we publish a set of pre-processing tools for feature boosting of retinal lesions and retinal blood vessels for building classification and segmentation models in ROP analysis.
- MeSH
- lidé MeSH
- novorozenec nedonošený * MeSH
- novorozenec MeSH
- počítačové zpracování obrazu MeSH
- retina * diagnostické zobrazování MeSH
- retinopatie nedonošených * diagnostické zobrazování MeSH
- Check Tag
- lidé MeSH
- novorozenec MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
- Geografické názvy
- Česká republika MeSH
The chondrocranium provides the key initial support for the fetal brain, jaws and cranial sensory organs in all vertebrates. The patterns of shaping and growth of the chondrocranium set up species-specific development of the entire craniofacial complex. The 3D development of chondrocranium have been studied primarily in animal model organisms, such as mice or zebrafish. In comparison, very little is known about the full 3D human chondrocranium, except from drawings made by anatomists many decades ago. The knowledge of human-specific aspects of chondrocranial development are essential for understanding congenital craniofacial defects and human evolution. Here advanced microCT scanning was used that includes contrast enhancement to generate the first 3D atlas of the human fetal chondrocranium during the middle trimester (13 to 19 weeks). In addition, since cartilage and bone are both visible with the techniques used, the endochondral ossification of cranial base was mapped since this region is so critical for brain and jaw growth. The human 3D models are published as a scientific resource for human development.
- MeSH
- chrupavka diagnostické zobrazování embryologie MeSH
- lebka diagnostické zobrazování embryologie MeSH
- lidé MeSH
- plod diagnostické zobrazování MeSH
- rentgenová mikrotomografie MeSH
- těhotenství MeSH
- zobrazování trojrozměrné * MeSH
- Check Tag
- lidé MeSH
- těhotenství MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
Microscopic examination plays a significant role in the initial screening for a variety of hematological, as well as non-hematological, diagnoses. Microscopic blood smear examination that is considered a key diagnostic technique, is in recent clinical practice still performed manually, which is not only time consuming, but can lead to human errors. Although automated and semi-automated systems have been developed in recent years, their high purchasing and maintenance costs make them unaffordable for many medical institutions. Even though much research has been conducted lately to explore more accurate and feasible solutions, most researchers had to deal with a lack of medical data. To address the lack of large-scale databases in this field, we created a high-resolution dataset containing a total of 16027 annotated white blood cells. Moreover, the dataset covers overall 9 types of white blood cells, including clinically significant pathological findings. Since we used high-quality acquisition equipment, the dataset provides one of the highest quality images of blood cells, achieving an approximate resolution of 42 pixels per 1 μm.
- MeSH
- leukocyty * cytologie patologie MeSH
- lidé MeSH
- mikroskopie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
- práce podpořená grantem MeSH
The recent human Monkeypox outbreak underlined the importance of studying basic biology of orthopoxviruses. However, the transcriptome of its causative agent has not been investigated before neither with short-, nor with long-read sequencing approaches. This Oxford Nanopore long-read RNA-Sequencing dataset fills this gap. It will enable the in-depth characterization of the transcriptomic architecture of the monkeypox virus, and may even make possible to annotate novel host transcripts. Moreover, our direct cDNA and native RNA sequencing reads will allow the estimation of gene expression changes of both the virus and the host cells during the infection. Overall, our study will lead to a deeper understanding of the alterations caused by the viral infection on a transcriptome level.
- MeSH
- komplementární DNA MeSH
- lidé MeSH
- nanopórové sekvenování * MeSH
- opičí neštovice * MeSH
- stanovení celkové genové exprese MeSH
- transkriptom MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
The human brain represents a complex computational system, the function and structure of which may be measured using various neuroimaging techniques focusing on separate properties of the brain tissue and activity. We capture the organization of white matter fibers acquired by diffusion-weighted imaging using probabilistic diffusion tractography. By segmenting the results of tractography into larger anatomical units, it is possible to draw inferences about the structural relationships between these parts of the system. This pipeline results in a structural connectivity matrix, which contains an estimate of connection strength among all regions. However, raw data processing is complex, computationally intensive, and requires expert quality control, which may be discouraging for researchers with less experience in the field. We thus provide brain structural connectivity matrices in a form ready for modelling and analysis and thus usable by a wide community of scientists. The presented dataset contains brain structural connectivity matrices together with the underlying raw diffusion and structural data, as well as basic demographic data of 88 healthy subjects.
The growing interdisciplinary research field of psycholinguistics is in constant need of new and up-to-date tools which will allow researchers to answer complex questions, but also expand on languages other than English, which dominates the field. One type of such tools are picture datasets which provide naming norms for everyday objects. However, existing databases tend to be small in terms of the number of items they include, and have also been normed in a limited number of languages, despite the recent boom in multilingualism research. In this paper we present the Multilingual Picture (Multipic) database, containing naming norms and familiarity scores for 500 coloured pictures, in thirty-two languages or language varieties from around the world. The data was validated with standard methods that have been used for existing picture datasets. This is the first dataset to provide naming norms, and translation equivalents, for such a variety of languages; as such, it will be of particular value to psycholinguists and other interested researchers. The dataset has been made freely available.
- MeSH
- databáze faktografické MeSH
- jazyk (prostředek komunikace) MeSH
- lidé MeSH
- mnohojazyčnost * MeSH
- psycholingvistika * MeSH
- rozpoznávání (psychologie) MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
Listeria monocytogenes (Lm) is a ubiquitous bacterium that causes listeriosis, a serious foodborne illness. In the nature-to-human transmission route, Lm can prosper in various ecological niches. Soil and decaying organic matter are its primary reservoirs. Certain clonal complexes (CCs) are over-represented in food production and represent a challenge to food safety. To gain new understanding of Lm adaptation mechanisms in food, the genetic background of strains found in animals and environment should be investigated in comparison to that of food strains. Twenty-one partners, including food, environment, veterinary and public health laboratories, constructed a dataset of 1484 genomes originating from Lm strains collected in 19 European countries. This dataset encompasses a large number of CCs occurring worldwide, covers many diverse habitats and is balanced between ecological compartments and geographic regions. The dataset presented here will contribute to improve our understanding of Lm ecology and should aid in the surveillance of Lm. This dataset provides a basis for the discovery of the genetic traits underlying Lm adaptation to different ecological niches.
Understanding biodiversity patterns as well as drivers of population declines, and range losses provides crucial baselines for monitoring and conservation. However, the information needed to evaluate such trends remains unstandardised and sparsely available for many taxonomic groups and habitats, including the cave-dwelling bats and cave ecosystems. We developed the DarkCideS 1.0 ( https://darkcides.org/ ), a global database of bat caves and species synthesised from publicly available information and datasets. The DarkCideS 1.0 is by far the largest database for cave-dwelling bats, which contains information for geographical location, ecological status, species traits, and parasites and hyperparasites for 679 bat species are known to occur in caves or use caves in part of their life histories. The database currently contains 6746 georeferenced occurrences for 402 cave-dwelling bat species from 2002 cave sites in 46 countries and 12 terrestrial biomes. The database has been developed to be collaborative and open-access, allowing continuous data-sharing among the community of bat researchers and conservation biologists to advance bat research and comparative monitoring and prioritisation for conservation.
- MeSH
- biodiverzita MeSH
- Chiroptera * MeSH
- databáze faktografické MeSH
- ekosystém MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH