Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency.Database URL: www.prot2hg.com.
- MeSH
- Molecular Sequence Annotation methods MeSH
- Data Mining methods MeSH
- Databases, Genetic * MeSH
- Data Curation methods MeSH
- Genetic Variation * MeSH
- Genome, Human genetics MeSH
- Genomics methods MeSH
- Internet MeSH
- Humans MeSH
- Protein Domains genetics MeSH
- Proteins chemistry genetics metabolism MeSH
- Computational Biology methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The majority of naturally occurring proteins have evolved to function under mild conditions inside the living organisms. One of the critical obstacles for the use of proteins in biotechnological applications is their insufficient stability at elevated temperatures or in the presence of salts. Since experimental screening for stabilizing mutations is typically laborious and expensive, in silico predictors are often used for narrowing down the mutational landscape. The recent advances in machine learning and artificial intelligence further facilitate the development of such computational tools. However, the accuracy of these predictors strongly depends on the quality and amount of data used for training and testing, which have often been reported as the current bottleneck of the approach. To address this problem, we present a novel database of experimental thermostability data for single-point mutants FireProtDB. The database combines the published datasets, data extracted manually from the recent literature, and the data collected in our laboratory. Its user interface is designed to facilitate both types of the expected use: (i) the interactive explorations of individual entries on the level of a protein or mutation and (ii) the construction of highly customized and machine learning-friendly datasets using advanced searching and filtering. The database is freely available at https://loschmidt.chemi.muni.cz/fireprotdb.
- MeSH
- Molecular Sequence Annotation MeSH
- Point Mutation * MeSH
- Databases, Protein * MeSH
- Datasets as Topic MeSH
- Internet MeSH
- Models, Molecular MeSH
- Proteins chemistry genetics MeSH
- Software MeSH
- Protein Stability MeSH
- Machine Learning statistics & numerical data MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Proteins are the most abundant component of the cell nucleus, where they perform a plethora of functions, including the assembly of long DNA molecules into condensed chromatin, DNA replication and repair, regulation of gene expression, synthesis of RNA molecules and their modification. Proteins are important components of nuclear bodies and are involved in the maintenance of the nuclear architecture, transport across the nuclear envelope and cell division. Given their importance, the current poor knowledge of plant nuclear proteins and their dynamics during the cell's life and division is striking. Several factors hamper the analysis of the plant nuclear proteome, but the most critical seems to be the contamination of nuclei by cytosolic material during their isolation. With the availability of an efficient protocol for the purification of plant nuclei, based on flow cytometric sorting, contamination by cytoplasmic remnants can be minimized. Moreover, flow cytometry allows the separation of nuclei in different stages of the cell cycle (G1, S, and G2). This strategy has led to the identification of large number of nuclear proteins from barley (Hordeum vulgare), thus triggering the creation of a dedicated database called UNcleProt, http://barley.gambrinus.ueb.cas.cz/ .
Following the discovery of serious errors in the structure of biomacromolecules, structure validation has become a key topic of research, especially for ligands and non-standard residues. ValidatorDB (freely available at http://ncbr.muni.cz/ValidatorDB) offers a new step in this direction, in the form of a database of validation results for all ligands and non-standard residues from the Protein Data Bank (all molecules with seven or more heavy atoms). Model molecules from the wwPDB Chemical Component Dictionary are used as reference during validation. ValidatorDB covers the main aspects of validation of annotation, and additionally introduces several useful validation analyses. The most significant is the classification of chirality errors, allowing the user to distinguish between serious issues and minor inconsistencies. Other such analyses are able to report, for example, completely erroneous ligands, alternate conformations or complete identity with the model molecules. All results are systematically classified into categories, and statistical evaluations are performed. In addition to detailed validation reports for each molecule, ValidatorDB provides summaries of the validation results for the entire PDB, for sets of molecules sharing the same annotation (three-letter code) or the same PDB entry, and for user-defined selections of annotations or PDB entries.
- MeSH
- Amino Acids chemistry MeSH
- Molecular Sequence Annotation MeSH
- Databases, Protein * MeSH
- Internet MeSH
- Protein Conformation MeSH
- Ligands MeSH
- Models, Molecular MeSH
- Proteins chemistry MeSH
- Reproducibility of Results MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
IRESite is an exhaustive, manually annotated non-redundant relational database focused on the IRES elements (Internal Ribosome Entry Site) and containing information not available in the primary public databases. IRES elements were originally found in eukaryotic viruses hijacking initiation of translation of their host. Later on, they were also discovered in 5'-untranslated regions of some eukaryotic mRNA molecules. Currently, IRESite presents up to 92 biologically relevant aspects of every experiment, e.g. the nature of an IRES element, its functionality/defectivity, origin, size, sequence, structure, its relative position with respect to surrounding protein coding regions, positive/negative controls used in the experiment, the reporter genes used to monitor IRES activity, the measured reporter protein yields/activities, and references to original publications as well as cross-references to other databases, and also comments from submitters and our curators. Furthermore, the site presents the known similarities to rRNA sequences as well as RNA-protein interactions. Special care is given to the annotation of promoter-like regions. The annotated data in IRESite are bound to mostly complete, full-length mRNA, and whenever possible, accompanied by original plasmid vector sequences. New data can be submitted through the publicly available web-based interface at http://www.iresite.org and are curated by a team of lab-experienced biologists.
- MeSH
- Databases, Nucleic Acid MeSH
- Financing, Organized MeSH
- Peptide Chain Initiation, Translational MeSH
- Peptide Initiation Factors metabolism MeSH
- Internet MeSH
- RNA, Messenger chemistry MeSH
- Untranslated Regions chemistry MeSH
- Plasmids chemistry MeSH
- Promoter Regions, Genetic MeSH
- Regulatory Sequences, Ribonucleic Acid MeSH
- RNA, Viral chemistry MeSH
- User-Computer Interface MeSH
Morphine is considered a gold standard in pain treatment. Nevertheless, its use could be associated with severe side effects, including drug addiction. Thus, it is very important to understand the molecular mechanism of morphine action in order to develop new methods of pain therapy, or at least to attenuate the side effects of opioids usage. Proteomics allows for the indication of proteins involved in certain biological processes, but the number of items identified in a single study is usually overwhelming. Thus, researchers face the difficult problem of choosing the proteins which are really important for the investigated processes and worth further studies. Therefore, based on the 29 published articles, we created a database of proteins regulated by morphine administration - The Morphinome Database (addiction-proteomics.org). This web tool allows for indicating proteins that were identified during different proteomics studies. Moreover, the collection and organization of such a vast amount of data allows us to find the same proteins that were identified in various studies and to create their ranking, based on the frequency of their identification. STRING and KEGG databases indicated metabolic pathways which those molecules are involved in. This means that those molecular pathways seem to be strongly affected by morphine administration and could be important targets for further investigations. SIGNIFICANCE: The data about proteins identified by different proteomics studies of molecular changes caused by morphine administration (29 published articles) were gathered in the Morphinome Database. Unification of those data allowed for the identification of proteins that were indicated several times by distinct proteomics studies, which means that they seem to be very well verified and important for the entire process. Those proteins might be now considered promising aims for more detailed studies of their role in the molecular mechanism of morphine action.
- MeSH
- Databases, Factual * MeSH
- Databases as Topic MeSH
- Internet MeSH
- Humans MeSH
- Metabolic Networks and Pathways MeSH
- Morphine administration & dosage MeSH
- Proteins drug effects MeSH
- Proteomics methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Emerging infectious diseases (EIDs) are a severe problem caused by fungi in human and plant species across the world. They pose a worldwide threat to food security as well as human health. Fungal infections are increasing now day by day worldwide, and the current antimycotic drugs are not effective due to the emergence of resistant strains. Therefore, it is an urgent need for the finding of new plant-origin antifungal peptides (PhytoAFPs). Huge numbers of peptides were extracted from different plant species which play a protective role against fungal infection. Hundreds of plant-origin peptides with antifungal activity have already been reported. So there is a requirement of a dedicated platform which systematically catalogs plant-origin peptides along with their antifungal properties. PlantAFP database is a resource of experimentally verified plant-origin antifungal peptides, collected from research articles, patents, and public databases. The current release of PlantAFP database contains 2585 peptide entries among which 510 are unique peptides. Each entry provides comprehensive information of a peptide that includes its peptide sequence, peptide name, peptide class, length of the peptide, molecular mass, antifungal activity, and origin of peptides. Besides this primary information, PlantAFP stores peptide sequences in SMILES format. In order to facilitate the user, many tools have been integrated into this database that includes BLAST search, peptide search, SMILES search, and peptide-mapping is also included in the database. PlantAFP database is accessible at http://bioinformatics.cimap.res.in/sharma/PlantAFP/.