JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Annotation term Dotaz Zobrazit nápovědu

Přesná shoda Sémantické

Reset

58 záznamů v Medvik

Článek

Tool-supported Interactive Correction and Semantic Annotation of Narrative Clinical Reports

... by clinical terms. ...

Methods of information in medicine. 2017 ; 56 (3) : 217-229. [pub] 20170428

Methods Inf Med
ISSN 2511-705X
Medvik
Zdroj

OBJECTIVES: Our main objective is to design a method of, and supporting software for, interactive correction and semantic annotation of narrative clinical reports, which would allow for their easier and less erroneous processing outside their original context: first, by physicians unfamiliar with the original language (and possibly also the source specialty), and second, by tools requiring structured information, such as decision-support systems. Our additional goal is to gain insights into the process of narrative report creation, including the errors and ambiguities arising therein, and also into the process of report annotation by clinical terms. Finally, we also aim to provide a dataset of ground-truth transformations (specific for Czech as the source language), set up by expert physicians, which can be reused in the future for subsequent analytical studies and for training automated transformation procedures. METHODS: A three-phase preprocessing method has been developed to support secondary use of narrative clinical reports in electronic health record. Narrative clinical reports are narrative texts of healthcare documentation often stored in electronic health records. In the first phase a narrative clinical report is tokenized. In the second phase the tokenized clinical report is normalized. The normalized clinical report is easily readable for health professionals with the knowledge of the language used in the narrative clinical report. In the third phase the normalized clinical report is enriched with extracted structured information. The final result of the third phase is a semi-structured normalized clinical report where the extracted clinical terms are matched to codebook terms. Software tools for interactive correction, expansion and semantic annotation of narrative clinical reports has been developed and the three-phase preprocessing method validated in the cardiology area. RESULTS: The three-phase preprocessing method was validated on 49 anonymous Czech narrative clinical reports in the field of cardiology. Descriptive statistics from the database of accomplished transformations has been calculated. Two cardiologists participated in the annotation phase. The first cardiologist annotated 1500 clinical terms found in 49 narrative clinical reports to codebook terms using the classification systems ICD 10, SNOMED CT, LOINC and LEKY. The second cardiologist validated annotations of the first cardiologist. The correct clinical terms and the codebook terms have been stored in a database. CONCLUSIONS: We extracted structured information from Czech narrative clinical reports by the proposed three-phase preprocessing method and linked it to electronic health records. The software tool, although generic, is tailored for Czech as the specific language of electronic health record pool under study. This will provide a potential etalon for porting this approach to dozens of other less-spoken languages. Structured information can support medical decision making, quality assurance tasks and further medical research.

MeSH
elektronické zdravotní záznamy normy MeSH
mezinárodní klasifikace nemocí MeSH
psaní normy MeSH
řízený slovník * MeSH
sémantika * MeSH
směrnice jako téma MeSH
smysluplné využití normy MeSH
software MeSH
správnost dat MeSH
strojové učení * MeSH
uživatelské rozhraní počítače MeSH
zpracování přirozeného jazyka * MeSH
zpracování textu normy MeSH
Publikační typ
časopisecké články MeSH

Článek online

Nucleotide diversity of functionally different groups of immune response genes in Old World camels based on newly annotated and reference-guided assemblies

... Improved and more accurate genome assemblies and annotations are needed to study complex genomic regions ...

BMC genomics. 2020 ; 21 (1) : 606. [pub] 20200903

BMC Genomics
ISSN 1471-2164
Medvik
Zdroj

BACKGROUND: Immune-response (IR) genes have an important role in the defense against highly variable pathogens, and therefore, diversity in these genomic regions is essential for species' survival and adaptation. Although current genome assemblies from Old World camelids are very useful for investigating genome-wide diversity, demography and population structure, they have inconsistencies and gaps that limit analyses at local genomic scales. Improved and more accurate genome assemblies and annotations are needed to study complex genomic regions like adaptive and innate IR genes. RESULTS: In this work, we improved the genome assemblies of the three Old World camel species - domestic dromedary and Bactrian camel, and the two-humped wild camel - via different computational methods. The newly annotated dromedary genome assembly CamDro3 served as reference to scaffold the NCBI RefSeq genomes of domestic Bactrian and wild camels. These upgraded assemblies were then used to assess nucleotide diversity of IR genes within and between species, and to compare the diversity found in immune genes and the rest of the genes in the genome. We detected differences in the nucleotide diversity among the three Old World camelid species and between IR gene groups, i.e., innate versus adaptive. Among the three species, domestic Bactrian camels showed the highest mean nucleotide diversity. Among the functionally different IR gene groups, the highest mean nucleotide diversity was observed in the major histocompatibility complex. CONCLUSIONS: The new camel genome assemblies were greatly improved in terms of contiguity and increased size with fewer scaffolds, which is of general value for the scientific community. This allowed us to perform in-depth studies on genetic diversity in immunity-related regions of the genome. Our results suggest that differences of diversity across classes of genes appear compatible with a combined role of population history and differential exposures to pathogens, and consequent different selective pressures.

Článek online

Developmental coordination disorder in children - experimental work and data annotation

... The second aim is to properly annotate the obtained raw data with relevant metadata and promote their ...

GigaScience. 2017 ; 6 (4) : 1-6.

Gigascience
ISSN 2047-217X
Medvik
Zdroj

Background: Developmental coordination disorder (DCD) is described as a motor skill disorder characterized by a marked impairment in the development of motor coordination abilities that significantly interferes with performance of daily activities and/or academic achievement. Since some electrophysiological studies suggest differences between children with/without motor development problems, we prepared an experimental protocol and performed electrophysiological experiments with the aim of making a step toward a possible diagnosis of this disorder using the event-related potentials (ERP) technique. The second aim is to properly annotate the obtained raw data with relevant metadata and promote their long-term sustainability. Results: The data from 32 school children (16 with possible DCD and 16 in the control group) were collected. Each dataset contains raw electroencephalography (EEG) data in the BrainVision format and provides sufficient metadata (such as age, gender, results of the motor test, and hearing thresholds) to allow other researchers to perform analysis. For each experiment, the percentage of ERP trials damaged by blinking artifacts was estimated. Furthermore, ERP trials were averaged across different participants and conditions, and the resulting plots are included in the manuscript. This should help researchers to estimate the usability of individual datasets for analysis. Conclusions: The aim of the whole project is to find out if it is possible to make any conclusions about DCD from EEG data obtained. For the purpose of further analysis, the data were collected and annotated respecting the current outcomes of the International Neuroinformatics Coordinating Facility Program on Standards for Data Sharing, the Task Force on Electrophysiology, and the group developing the Ontology for Experimental Neurophysiology. The data with metadata are stored in the EEG/ERP Portal.

MeSH
akustická stimulace MeSH
datové kurátorství MeSH
dítě MeSH
elektroencefalografie MeSH
evokované potenciály MeSH
komorbidita MeSH
kvantitativní znak dědičný MeSH
lidé MeSH
počítačová simulace MeSH
poruchy motorických dovedností diagnóza MeSH
reakční čas MeSH
reprodukovatelnost výsledků MeSH
software MeSH
světelná stimulace MeSH
Check Tag
dítě MeSH
lidé MeSH
mužské pohlaví MeSH
ženské pohlaví MeSH
Publikační typ
časopisecké články MeSH

Článek online

Insight into the Salivary Gland Transcriptome of Lygus lineolaris (Palisot de Beauvois)

... Gene ontology (GO) terms were assigned to 7,512 proteins, and 791 proteins in the sialotranscriptome ...

PLoS One. 2016 ; 11 (1) : e0147197. [pub] 20160120

ISSN 1932-6203
Medvik
Zdroj

The tarnished plant bug (TPB), Lygus lineolaris (Palisot de Beauvois) is a polyphagous, phytophagous insect that has emerged as a major pest of cotton, alfalfa, fruits, and vegetable crops in the eastern United States and Canada. Using its piercing-sucking mouthparts, TPB employs a "lacerate and flush" feeding strategy in which saliva injected into plant tissue degrades cell wall components and lyses cells whose contents are subsequently imbibed by the TPB. It is known that a major component of TPB saliva is the polygalacturonase enzymes that degrade the pectin in the cell walls. However, not much is known about the other components of the saliva of this important pest. In this study, we explored the salivary gland transcriptome of TPB using Illumina sequencing. After in silico conversion of RNA sequences into corresponding polypeptides, 25,767 putative proteins were discovered. Of these, 19,540 (78.83%) showed significant similarity to known proteins in the either the NCBI nr or Uniprot databases. Gene ontology (GO) terms were assigned to 7,512 proteins, and 791 proteins in the sialotranscriptome of TPB were found to collectively map to 107 Kyoto Encyclopedia of Genes and Genomes (KEGG) database pathways. A total of 3,653 Pfam domains were identified in 10,421 sialotranscriptome predicted proteins resulting in 12,814 Pfam annotations; some proteins had more than one Pfam domain. Functional annotation revealed a number of salivary gland proteins that potentially facilitate degradation of host plant tissues and mitigation of the host plant defense response. These transcripts/proteins and their potential roles in TPB establishment are described.

MeSH
anotace sekvence MeSH
genová ontologie MeSH
Heteroptera genetika růst a vývoj metabolismus MeSH
hmyzí geny genetika MeSH
slinné žlázy metabolismus MeSH
stanovení celkové genové exprese * MeSH
zvířata MeSH
Check Tag
zvířata MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

Článek online

circGPA: circRNA functional annotation based on probability-generating functions

... In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and ...

BMC bioinformatics. 2022 ; 23 (1) : 392. [pub] 20220927

BMC Bioinformatics
ISSN 1471-2105
Medvik
Zdroj

Recent research has already shown that circular RNAs (circRNAs) are functional in gene expression regulation and potentially related to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. However, the function of most circRNAs remains unknown, and it is expensive and time-consuming to discover it through biological experiments. In this paper, we predict circRNA annotations from the knowledge of their interaction with miRNAs and subsequent miRNA-mRNA interactions. First, we construct an interaction network for a target circRNA and secondly spread the information from the network nodes with the known function to the root circRNA node. This idea itself is not new; our main contribution lies in proposing an efficient and exact deterministic procedure based on the principle of probability-generating functions to calculate the p-value of association test between a circRNA and an annotation term. We show that our publicly available algorithm is both more effective and efficient than the commonly used Monte-Carlo sampling approach that may suffer from difficult quantification of sampling convergence and subsequent sampling inefficiency. We experimentally demonstrate that the new approach is two orders of magnitude faster than the Monte-Carlo sampling, which makes summary annotation of large circRNA files feasible; this includes their reannotation after periodical interaction network updates, for example. We provide a summary annotation of a current circRNA database as one of our outputs. The proposed algorithm could be generalized towards other types of RNA in way that is straightforward.

MeSH
biologické markery MeSH
genové regulační sítě MeSH
kruhová RNA * MeSH
messenger RNA genetika metabolismus MeSH
mikro RNA * genetika metabolismus MeSH
pravděpodobnost MeSH
stanovení celkové genové exprese metody MeSH
Publikační typ
časopisecké články MeSH

Článek online

Comparative genomics of the Natural Killer Complex in carnivores

... The major limitations of automatic annotation of the NKC in non-model animals include short-read based ...

Frontiers in immunology. 2024 ; 15 (-) : 1459122. [pub] 20241003

Front Immunol
ISSN 1664-3224
Medvik
Zdroj

BACKGROUND: The mammalian Natural Killer Complex (NKC) harbors genes and gene families encoding a variety of C-type lectin-like proteins expressed on various immune cells. The NKC is a complex genomic region well-characterized in mice, humans and domestic animals. The major limitations of automatic annotation of the NKC in non-model animals include short-read based sequencing, methods of assembling highly homologous and repetitive sequences, orthologues missing from reference databases and weak expression. In this situation, manual annotations of complex genomic regions are necessary. METHODS: This study presents a manual annotation of the genomic structure of the NKC region in a high-quality reference genome of the domestic cat and compares it with other felid species and with representatives of other carnivore families. Reference genomes of Carnivora, irrespective of sequencing and assembly methods, were screened by BLAST to retrieve information on their killer cell lectin-like receptor (KLR) gene content. Phylogenetic analysis of in silico translated proteins of expanded subfamilies was carried out. RESULTS: The overall genomic structure of the NKC in Carnivora is rather conservative in terms of its C-type lectin receptor gene content. A novel KLRH-like gene subfamily (KLRL) was identified in all Carnivora and a novel KLRJ-like gene was annotated in the Mustelidae. In all six families studied, one subfamily (KLRC) expanded and experienced pseudogenization. The KLRH gene subfamily expanded in all carnivore families except the Canidae. The KLRL gene subfamily expanded in carnivore families except the Felidae and Canidae, and in the Canidae it eroded to fragments. CONCLUSIONS: Knowledge of the genomic structure and gene content of the NKC region is a prerequisite for accurate annotations of newly sequenced genomes, especially of endangered wildlife species. Identification of expressed genes, pseudogenes and gene fragments in the context of expanded gene families would allow the assessment of functionally important variability in particular species.

Článek online

Semantic biclustering for finding local, interpretable and predictive expression patterns

... homogeneous submatrices, however, we also require that the included elements can be jointly described in terms ...

BMC genomics. 2017 ; 18 (Suppl 7) : 752. [pub] 20171016

BMC Genomics
ISSN 1471-2164
Medvik
Zdroj

BACKGROUND: One of the major challenges in the analysis of gene expression data is to identify local patterns composed of genes showing coherent expression across subsets of experimental conditions. Such patterns may provide an understanding of underlying biological processes related to these conditions. This understanding can further be improved by providing concise characterizations of the genes and situations delimiting the pattern. RESULTS: We propose a method called semantic biclustering with the aim to detect interpretable rectangular patterns in binary data matrices. As usual in biclustering, we seek homogeneous submatrices, however, we also require that the included elements can be jointly described in terms of semantic annotations pertaining to both rows (genes) and columns (samples). To find such interpretable biclusters, we explore two strategies. The first endows an existing biclustering algorithm with the semantic ingredients. The other is based on rule and tree learning known from machine learning. CONCLUSIONS: The two alternatives are tested in experiments with two Drosophila melanogaster gene expression datasets. Both strategies are shown to detect sets of compact biclusters with semantic descriptions that also remain largely valid for unseen (testing) data. This desirable generalization aspect is more emphasized in the strategy stemming from conventional biclustering although this is traded off by the complexity of the descriptions (number of ontology terms employed), which, on the other hand, is lower for the alternative strategy.

Článek online

Teaching transposon classification as a means to crowd source the curation of repeat annotation - a tardigrade perspective

... Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic ...

Mobile DNA. 2024 ; 15 (1) : 10. [pub] 20240506

Mob DNA
ISSN 1759-8753
Medvik
Zdroj

BACKGROUND: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. RESULTS: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. CONCLUSIONS: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.

Publikační typ
časopisecké články MeSH

Článek online

Molecular patterns of diffuse and nodular parathyroid hyperplasia in long-term hemodialysis

... transcriptome screening and subsequently for discriminatory gene analysis, pathway mapping, and gene annotation ...

American journal of physiology: endocrinology and metabolism. 2016 ; 311 (4) : E720-E729. [pub] 20160906

Am J Physiol Endocrinol Metab
ISSN 1522-1555
Medvik
Zdroj

Secondary hyperparathyroidism is a well-known complication of end-stage renal disease (ESRD). Both nodular and diffuse parathyroid hyperplasia occur in ESRD patients. However, their distinct molecular mechanisms remain poorly understood. Parathyroid tissue obtained from ESRD patients who had undergone parathyroidectomy was used for Illumina transcriptome screening and subsequently for discriminatory gene analysis, pathway mapping, and gene annotation enrichment analysis. Results were further validated using quantitative RT-PCR on the independent larger cohort. Microarray screening proved homogeneity of gene transcripts in hemodialysis patients compared with the transplant cohort and primary hyperparathyroidism; therefore, further experiments were performed in hemodialysis patients only. Enrichment analysis conducted on 485 differentially expressed genes between nodular and diffuse parathyroid hyperplasia revealed highly significant differences in Gene Ontology terms and the Kyoto Encyclopedia of Genes and Genomes database in ribosome structure (P = 3.70 × 10(-18)). Next, quantitative RT-PCR validation of the top differently expressed genes from microarray analysis proved higher expression of RAN guanine nucleotide release factor (RANGRF; P < 0.001), calcyclin-binding protein (CACYBP; P < 0.05), and exocyst complex component 8 (EXOC8; P < 0.05) and lower expression of peptidylprolyl cis/trans-isomerase and NIMA-interacting 1 (PIN1; P < 0.01) mRNA in nodular hyperplasia. Multivariate analysis revealed higher RANGRF and lower PIN1 expression along with parathyroid weight to be associated with nodular hyperplasia. In conclusion, our study suggests the RANGRF transcript, which controls RNA metabolism, to be likely involved in pathways associated with the switch to nodular parathyroid growth. This transcript, along with PIN1 transcript, which influences parathyroid hormone secretion, may represent new therapeutical targets to cure secondary hyperparathyroidism.

MeSH
chronické selhání ledvin komplikace terapie MeSH
dialýza ledvin * MeSH
dospělí MeSH
fokální nodulární hyperplazie etiologie genetika terapie MeSH
lidé středního věku MeSH
lidé MeSH
messenger RNA biosyntéza genetika MeSH
multigenová rodina genetika MeSH
parathormon krev MeSH
paratyreoidea patologie MeSH
paratyreoidektomie MeSH
primární hyperparatyreóza patologie MeSH
regulace genové exprese genetika MeSH
sekundární hyperparatyreóza etiologie genetika terapie MeSH
senioři MeSH
stanovení celkové genové exprese MeSH
transkriptom genetika MeSH
Check Tag
dospělí MeSH
lidé středního věku MeSH
lidé MeSH
mužské pohlaví MeSH
senioři MeSH
ženské pohlaví MeSH
Publikační typ
časopisecké články MeSH

Článek online

Usability of reference-free transcriptome assemblies for detection of differential expression: a case study on Aethionema arabicum dimorphic seeds

... Gene Ontology (GO) terms distinguished the seed morphs: the terms translation and nucleosome assembly ...

BMC genomics. 2019 ; 20 (1) : 95. [pub] 20190130

BMC Genomics
ISSN 1471-2164
Medvik
Zdroj

BACKGROUND: RNA-sequencing analysis is increasingly utilized to study gene expression in non-model organisms without sequenced genomes. Aethionema arabicum (Brassicaceae) exhibits seed dimorphism as a bet-hedging strategy - producing both a less dormant mucilaginous (M+) seed morph and a more dormant non-mucilaginous (NM) seed morph. Here, we compared de novo and reference-genome based transcriptome assemblies to investigate Ae. arabicum seed dimorphism and to evaluate the reference-free versus -dependent approach for identifying differentially expressed genes (DEGs). RESULTS: A de novo transcriptome assembly was generated using sequences from M+ and NM Ae. arabicum dry seed morphs. The transcripts of the de novo assembly contained 63.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCO) compared to 90.9% for the transcripts of the reference genome. DEG detection used the strict consensus of three methods (DESeq2, edgeR and NOISeq). Only 37% of 1533 differentially expressed de novo assembled transcripts paired with 1876 genome-derived DEGs. Gene Ontology (GO) terms distinguished the seed morphs: the terms translation and nucleosome assembly were overrepresented in DEGs higher in abundance in M+ dry seeds, whereas terms related to mRNA processing and transcription were overrepresented in DEGs higher in abundance in NM dry seeds. DEGs amongst these GO terms included ribosomal proteins and histones (higher in M+), RNA polymerase II subunits and related transcription and elongation factors (higher in NM). Expression of the inferred DEGs and other genes associated with seed maturation (e.g. those encoding late embryogenesis abundant proteins and transcription factors regulating seed development and maturation such as ABI3, FUS3, LEC1 and WRI1 homologs) were put in context with Arabidopsis thaliana seed maturation and indicated that M+ seeds may desiccate and mature faster than NM. The 1901 transcriptomic DEG set GO-terms had almost 90% overlap with the 2191 genome-derived DEG GO-terms. CONCLUSIONS: Whilst there was only modest overlap of DEGs identified in reference-free versus -dependent approaches, the resulting GO analysis was concordant in both approaches. The identified differences in dry seed transcriptomes suggest mechanisms underpinning previously identified contrasts between morphology and germination behaviour of M+ and NM seeds.

Kolekce

Publikováno

Filtry

Annotation term Dotaz Zobrazit nápovědu

Přesná shoda Sémantické

Annotation term Dotaz Zobrazit nápovědu Přesná shoda Sémantické

Upřesnit dle MeSH

Annotation term Dotaz Zobrazit nápovědu

Přesná shoda Sémantické