Database mining Dotaz Zobrazit nápovědu
- Klíčová slova
- B-chromosomes, cytogenetics, data-mining, database, evolution, karyology, karyotype,
- MeSH
- chromozomy rostlin genetika MeSH
- chromozomy genetika MeSH
- data mining MeSH
- databáze bibliografické MeSH
- databáze genetické * MeSH
- houby genetika MeSH
- internet MeSH
- rostliny genetika MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- dopisy MeSH
- práce podpořená grantem MeSH
Proteins are the most abundant component of the cell nucleus, where they perform a plethora of functions, including the assembly of long DNA molecules into condensed chromatin, DNA replication and repair, regulation of gene expression, synthesis of RNA molecules and their modification. Proteins are important components of nuclear bodies and are involved in the maintenance of the nuclear architecture, transport across the nuclear envelope and cell division. Given their importance, the current poor knowledge of plant nuclear proteins and their dynamics during the cell's life and division is striking. Several factors hamper the analysis of the plant nuclear proteome, but the most critical seems to be the contamination of nuclei by cytosolic material during their isolation. With the availability of an efficient protocol for the purification of plant nuclei, based on flow cytometric sorting, contamination by cytoplasmic remnants can be minimized. Moreover, flow cytometry allows the separation of nuclei in different stages of the cell cycle (G1, S, and G2). This strategy has led to the identification of large number of nuclear proteins from barley (Hordeum vulgare), thus triggering the creation of a dedicated database called UNcleProt, http://barley.gambrinus.ueb.cas.cz/ .
- Klíčová slova
- barley, cell cycle, database, flow-cytometry, localization, mass spectrometry, nuclear proteome, nucleus,
- MeSH
- buněčný cyklus * MeSH
- data mining MeSH
- databáze proteinů * MeSH
- jaderné proteiny klasifikace metabolismus MeSH
- ječmen (rod) cytologie MeSH
- rostlinné proteiny klasifikace metabolismus MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- jaderné proteiny MeSH
- rostlinné proteiny MeSH
As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.
- Klíčová slova
- Benchmarks, Functional annotation, GO term enrichment, Keyword enhancement, Systems biology, Text mining,
- MeSH
- data mining metody trendy MeSH
- databáze genetické * trendy MeSH
- genová ontologie * trendy MeSH
- lidé MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency. Database URL: www.prot2hg.com.
- MeSH
- anotace sekvence metody MeSH
- data mining metody MeSH
- databáze genetické * MeSH
- datové kurátorství metody MeSH
- genetická variace * MeSH
- genom lidský genetika MeSH
- genomika metody MeSH
- internet MeSH
- lidé MeSH
- proteinové domény genetika MeSH
- proteiny chemie genetika metabolismus MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny MeSH
Lipidomics and metabolomics communities comprise various informatics tools; however, software programs handling multimodal mass spectrometry (MS) data with structural annotations guided by the Lipidomics Standards Initiative are limited. Here, we provide MS-DIAL 5 for in-depth lipidome structural elucidation through electron-activated dissociation (EAD)-based tandem MS and determining their molecular localization through MS imaging (MSI) data using a species/tissue-specific lipidome database containing the predicted collision-cross section values. With the optimized EAD settings using 14 eV kinetic energy, the program correctly delineated lipid structures for 96.4% of authentic standards, among which 78.0% had the sn-, OH-, and/or C = C positions correctly assigned at concentrations exceeding 1 μM. We showcased our workflow by annotating the sn- and double-bond positions of eye-specific phosphatidylcholines containing very-long-chain polyunsaturated fatty acids (VLC-PUFAs), characterized as PC n-3-VLC-PUFA/FA. Using MSI data from the eye and n-3-VLC-PUFA-supplemented HeLa cells, we identified glycerol 3-phosphate acyltransferase as an enzyme candidate responsible for incorporating n-3 VLC-PUFAs into the sn1 position of phospholipids in mammalian cells, which was confirmed using EAD-MS/MS and recombinant proteins in a cell-free system. Therefore, the MS-DIAL 5 environment, combined with optimized MS data acquisition methods, facilitates a better understanding of lipid structures and their localization, offering insights into lipid biology.
- MeSH
- data mining * metody MeSH
- fosfatidylcholiny metabolismus chemie MeSH
- HeLa buňky MeSH
- hmotnostní spektrometrie metody MeSH
- lidé MeSH
- lipidomika * metody MeSH
- lipidy chemie analýza MeSH
- metabolomika metody MeSH
- nenasycené mastné kyseliny metabolismus chemie MeSH
- software MeSH
- tandemová hmotnostní spektrometrie metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- fosfatidylcholiny MeSH
- lipidy MeSH
- nenasycené mastné kyseliny MeSH
The supernumerary mostly dispensable B chromosomes are nuclear components of about 15% of eukaryotic phyla. For a long time, B chromosomes have been studied, generating an enormous bulk of knowledge, diluted in the vastness of the scientific literature. In order to provide better access to this information, we created B-chrom ( www.bchrom.csic.es ), an online database with comprehensive information on Bs for plants, animals, and fungi. It was released in 2017 and first updated in 2021, by adding 334 entries and 123 new species. Currently, the resource provides information for 2951 species coming from 3292 sources. During this time, the usefulness of this database has been proven by the number of visits (more than 207,000 since its release) and by the scientific community, having been cited in more than 60 publications until present. This chapter explains the database composition and tips on how to use it.
- Klíčová slova
- B-chromosomes, Cytogenetics, Data mining, Database, Evolution, Karyology, Karyotype,
- MeSH
- chrom * MeSH
- chromozomy MeSH
- databáze faktografické MeSH
- Eukaryota * MeSH
- eukaryotické buňky MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- chrom * MeSH
The aim of this study was to discover new nitrilases with useful activities, especially towards dinitriles that are precursors of high-value cyano acids. Genes coding for putative nitrilases of different origins (fungal, plant, or bacterial) with moderate similarities to known nitrilases were selected by mining the GenBank database, synthesized artificially and expressed in Escherichia coli. The enzymes were purified, examined for their substrate specificities, and classified into subtypes (aromatic nitrilase, arylacetonitrilase, aliphatic nitrilase, cyanide hydratase) which were largely in accordance with those predicted from bioinformatic analysis. The catalytic potential of the nitrilases for dinitriles was examined with cyanophenyl acetonitriles, phenylenediacetonitriles, and fumaronitrile. The nitrilase activities and selectivities for dinitriles and the reaction products (cyano acid, cyano amide, diacid) depended on the enzyme subtype. At a preparative scale, all the examined dinitriles were hydrolyzed into cyano acids and fumaronitrile was converted to cyano amide using E. coli cells producing arylacetonitrilases and an aromatic nitrilase, respectively.
- Klíčová slova
- Arylacetonitrilases, Cyano acids, Dinitriles, Genome mining, Nitrilases,
- MeSH
- aminohydrolasy genetika metabolismus MeSH
- data mining MeSH
- Escherichia coli genetika metabolismus MeSH
- exprese genu MeSH
- klonování DNA MeSH
- nitrily metabolismus MeSH
- rekombinantní proteiny izolace a purifikace metabolismus MeSH
- substrátová specifita MeSH
- výpočetní biologie MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- aminohydrolasy MeSH
- nitrilase MeSH Prohlížeč
- nitrily MeSH
- rekombinantní proteiny MeSH
Acromegaly is a rare disorder caused by chronic growth hormone (GH) hypersecretion. While diagnostic and therapeutic methods have advanced, little information exists on trends in acromegaly characteristics over time. The Liège Acromegaly Survey (LAS) Database, a relational database, is designed to assess the profile of acromegaly patients at diagnosis and during long-term follow-up at multiple treatment centers. The following results were obtained at diagnosis. The study population consisted of 3173 acromegaly patients from ten countries; 54.5% were female. Males were significantly younger at diagnosis than females (43.5 vs 46.4 years; P < 0.001). The median delay from first symptoms to diagnosis was 2 years longer in females (P = 0.015). Ages at diagnosis and first symptoms increased significantly over time (P < 0.001). Tumors were larger in males than females (P < 0.001); tumor size and invasion were inversely related to patient age (P < 0.001). Random GH at diagnosis correlated with nadir GH levels during OGTT (P < 0.001). GH was inversely related to age in both sexes (P < 0.001). Diabetes mellitus was present in 27.5%, hypertension in 28.8%, sleep apnea syndrome in 25.5% and cardiac hypertrophy in 15.5%. Serious cardiovascular outcomes like stroke, heart failure and myocardial infarction were present in <5% at diagnosis. Erythrocyte levels were increased and correlated with IGF-1 values. Thyroid nodules were frequent (34.0%); 820 patients had colonoscopy at diagnosis and 13% had polyps. Osteoporosis was present at diagnosis in 12.3% and 0.6-4.4% had experienced a fracture. In conclusion, this study of >3100 patients is the largest international acromegaly database and shows clinically relevant trends in the characteristics of acromegaly at diagnosis.
- Klíčová slova
- IGF-1, acromegaly, comorbidity, data mining, database, diagnosis, growth hormone, pituitary adenoma, symptoms,
- MeSH
- akromegalie diagnóza patologie MeSH
- databáze faktografické MeSH
- lidé středního věku MeSH
- lidé MeSH
- lidský růstový hormon škodlivé účinky krev MeSH
- průzkumy a dotazníky MeSH
- Check Tag
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- lidský růstový hormon MeSH
Seed characteristics play an important role in the colonization and subsequent persistence of species during succession in disturbed sites and thus may contribute to being able to predict restoration success. In the present study, we investigated how various seed characteristics participated in 11 spontaneous successional series running in different mining sites (spoil heaps, extracted sand and sand-gravel pits, extracted peatlands, and stone quarries) in the Czech Republic, Central Europe. Using 1864 samples from 1- to 100-years-old successional stages, we tested whether species optimum along the succession gradient could be predicted using 10 basic species traits connected with diaspores and dispersal. Seed longevity, diaspore mass, endozoochory, and autochory appeared to be the best predictors. The results indicate that seed characteristics can predict to a certain degree spontaneous vegetation succession, i.e., passive restoration, in the mining sites. A screening of species available in the given landscape (regional and local species pools) may help to identify those species which would potentially colonize the disturbed sites. Extensive databases of species traits, nowadays available for the Central European flora, enable such screening.
- Klíčová slova
- Dispersal types, Life history traits, Meta-analysis, Mining sites, Passive restoration, Primary succession, Spontaneous succession,
- MeSH
- časové faktory MeSH
- distribuce rostlin * MeSH
- ekosystém MeSH
- hornictví * MeSH
- regenerace a remediace životního prostředí * MeSH
- semena rostlinná růst a vývoj MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Česká republika MeSH
BACKGROUND: Microarray technologies now belong to the standard functional genomics toolbox and have undergone massive development leading to increased genome coverage, accuracy and reliability. The number of experiments exploiting microarray technology has markedly increased in recent years. In parallel with the rapid accumulation of transcriptomic data, on-line analysis tools are being introduced to simplify their use. Global statistical data analysis methods contribute to the development of overall concepts about gene expression patterns and to query and compose working hypotheses. More recently, these applications are being supplemented with more specialized products offering visualization and specific data mining tools. We present a curated gene family-oriented gene expression database, Arabidopsis Gene Family Profiler (aGFP; http://agfp.ueb.cas.cz), which gives the user access to a large collection of normalised Affymetrix ATH1 microarray datasets. The database currently contains NASC Array and AtGenExpress transcriptomic datasets for various tissues at different developmental stages of wild type plants gathered from nearly 350 gene chips. RESULTS: The Arabidopsis GFP database has been designed as an easy-to-use tool for users needing an easily accessible resource for expression data of single genes, pre-defined gene families or custom gene sets, with the further possibility of keyword search. Arabidopsis Gene Family Profiler presents a user-friendly web interface using both graphic and text output. Data are stored at the MySQL server and individual queries are created in PHP script. The most distinguishable features of Arabidopsis Gene Family Profiler database are: 1) the presentation of normalized datasets (Affymetrix MAS algorithm and calculation of model-based gene-expression values based on the Perfect Match-only model); 2) the choice between two different normalization algorithms (Affymetrix MAS4 or MAS5 algorithms); 3) an intuitive interface; 4) an interactive "virtual plant" visualizing the spatial and developmental expression profiles of both gene families and individual genes. CONCLUSION: Arabidopsis GFP gives users the possibility to analyze current Arabidopsis developmental transcriptomic data starting with simple global queries that can be expanded and further refined to visualize comparative and highly selective gene expression profiles.