UNLABELLED: With biodiversity research activities being increasingly shifted to the web, the need for a system of persistent and stable identifiers for physical collection objects becomes increasingly pressing. The Consortium of European Taxonomic Facilities agreed on a common system of HTTP-URI-based stable identifiers which is now rolled out to its member organizations. The system follows Linked Open Data principles and implements redirection mechanisms to human-readable and machine-readable representations of specimens facilitating seamless integration into the growing semantic web. The implementation of stable identifiers across collection organizations is supported with open source provider software scripts, best practices documentations and recommendations for RDF metadata elements facilitating harmonized access to collection information in web portals. DATABASE URL: : http://cetaf.org/cetaf-stable-identifiers.
BACKGROUND: Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. RESULTS: We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust. CONCLUSIONS: The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
- Keywords
- Big data, Client-side scripting, Cluster heatmap, Data clustering, Exploration, JavaScript library, Scientific visualization, Web integration,
- Publication type
- Journal Article MeSH
Electronic Health Record (EHR) systems currently in use are not designed for widely interoperable longitudinal health data. Therefore, EHR data cannot be properly shared, managed and analyzed. In this article, we propose two approaches to making EHR data more comprehensive and FAIR (Findable, Accessible, Interoperable, and Reusable) and thus more useful for diagnosis and clinical research. Firstly, the data modeling based on the LinkML framework makes the data interoperability more realistic in diverse environments with various experts involved. We show the first results of how diverse health data can be integrated based on an easy-to-understand data model and without loss of available clinical knowledge. Secondly, decentralizing EHRs contributes to the higher availability of comprehensive and consistent EHR data. We propose a technology stack for decentralized EHRs and the reasons behind this proposal. Moreover, the two proposed approaches empower patients because their EHR data can become more available, understandable, and usable for them, and they can share their data according to their needs and preferences. Finally, we explore how the users of the proposed solution could be involved in the process of its validation and adoption.
- Keywords
- Distributed electronic health records, FAIR principles, HL7 FHIR, bio-data management, ontology,
- MeSH
- Data Management MeSH
- Electronic Health Records * MeSH
- Humans MeSH
- Semantic Web * MeSH
- Software MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
HotSpot Wizard is a web server for automatic identification of 'hot spots' for engineering of substrate specificity, activity or enantioselectivity of enzymes and for annotation of protein structures. The web server implements the protein engineering protocol, which targets evolutionarily variable amino acid positions located in the active site or lining the access tunnels. The 'hot spots' for mutagenesis are selected through the integration of structural, functional and evolutionary information obtained from: (i) the databases RCSB PDB, UniProt, PDBSWS, Catalytic Site Atlas and nr NCBI and (ii) the tools CASTp, CAVER, BLAST, CD-HIT, MUSCLE and Rate4Site. The protein structure and e-mail address are the only obligatory inputs for the calculation. In the output, HotSpot Wizard lists annotated residues ordered by estimated mutability. The results of the analysis are mapped on the enzyme structure and visualized in the web browser using Jmol. The HotSpot Wizard server should be useful for protein engineers interested in exploring the structure of their favourite protein and for the design of mutations in site-directed mutagenesis and focused directed evolution experiments. HotSpot Wizard is available at http://loschmidt.chemi.muni.cz/hotspotwizard/.
- MeSH
- beta-Lactamases chemistry MeSH
- Glycoside Hydrolases chemistry MeSH
- Phosphoric Triester Hydrolases chemistry MeSH
- Hydrolases chemistry MeSH
- Internet MeSH
- Protein Engineering * MeSH
- Reproducibility of Results MeSH
- Software * MeSH
- User-Computer Interface MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- beta-Lactamases MeSH
- Glycoside Hydrolases MeSH
- haloalkane dehalogenase MeSH Browser
- Phosphoric Triester Hydrolases MeSH
- Hydrolases MeSH
- licheninase MeSH Browser
Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.
- MeSH
- Internet MeSH
- Protein Conformation * MeSH
- Humans MeSH
- Mutation, Missense * MeSH
- Proteins chemistry genetics MeSH
- Software * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Proteins MeSH
Thermostable proteins find their use in numerous biomedical and biotechnological applications. However, the computational design of stable proteins often results in single-point mutations with a limited effect on protein stability. However, the construction of stable multiple-point mutants can prove difficult due to the possibility of antagonistic effects between individual mutations. FireProt protocol enables the automated computational design of highly stable multiple-point mutants. FireProt 2.0 builds on top of the previously published FireProt web, retaining the original functionality and expanding it with several new stabilization strategies. FireProt 2.0 integrates the AlphaFold database and the homology modeling for structure prediction, enabling calculations starting from a sequence. Multiple-point designs are constructed using the Bron-Kerbosch algorithm minimizing the antagonistic effect between the individual mutations. Users can newly limit the FireProt calculation to a set of user-defined mutations, run a saturation mutagenesis of the whole protein or select rigidifying mutations based on B-factors. Evolution-based back-to-consensus strategy is complemented by ancestral sequence reconstruction. FireProt 2.0 is significantly faster and a reworked graphical user interface broadens the tool's availability even to users with older hardware. FireProt 2.0 is freely available at http://loschmidt.chemi.muni.cz/fireprotweb.
- Keywords
- B-factor, ancestral, back-to-consensus, epistasis, evolution, force-field, multiple-point mutant, protein engineering, saturation mutagenesis, thermostability,
- MeSH
- Algorithms * MeSH
- Internet MeSH
- Mutation MeSH
- Proteins * genetics chemistry MeSH
- Protein Stability MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Proteins * MeSH
Large biomolecular structures are being determined experimentally on a daily basis using established techniques such as crystallography and electron microscopy. In addition, emerging integrative or hybrid methods (I/HM) are producing structural models of huge macromolecular machines and assemblies, sometimes containing 100s of millions of non-hydrogen atoms. The performance requirements for visualization and analysis tools delivering these data are increasing rapidly. Significant progress in developing online, web-native three-dimensional (3D) visualization tools was previously accomplished with the introduction of the LiteMol suite and NGL Viewers. Thereafter, Mol* development was jointly initiated by PDBe and RCSB PDB to combine and build on the strengths of LiteMol (developed by PDBe) and NGL (developed by RCSB PDB). The web-native Mol* Viewer enables 3D visualization and streaming of macromolecular coordinate and experimental data, together with capabilities for displaying structure quality, functional, or biological context annotations. High-performance graphics and data management allows users to simultaneously visualise up to hundreds of (superimposed) protein structures, stream molecular dynamics simulation trajectories, render cell-level models, or display huge I/HM structures. It is the primary 3D structure viewer used by PDBe and RCSB PDB. It can be easily integrated into third-party services. Mol* Viewer is open source and freely available at https://molstar.org/.
- MeSH
- Internet MeSH
- Protein Conformation MeSH
- Macromolecular Substances chemistry MeSH
- Models, Molecular * MeSH
- Software * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Names of Substances
- Macromolecular Substances MeSH
R-loops are common non-B nucleic acid structures formed by a three-stranded nucleic acid composed of an RNA-DNA hybrid and a displaced single-stranded DNA (ssDNA) loop. Because the aberrant R-loop formation leads to increased mutagenesis, hyper-recombination, rearrangements, and transcription-replication collisions, it is regarded as important in human diseases. Therefore, its prevalence and distribution in genomes are studied intensively. However, in silico tools for R-loop prediction are limited, and therefore, we have developed the R-loop tracker tool, which was implemented as a part of the DNA Analyser web server. This new tool is focused upon (1) prediction of R-loops in genomic DNA without length and sequence limitations; (2) integration of R-loop tracker results with other tools for nucleic acids analyses, including Genome Browser; (3) internal cross-evaluation of in silico results with experimental data, where available; (4) easy export and correlation analyses with other genome features and markers; and (5) enhanced visualization outputs. Our new R-loop tracker tool is freely accessible on the web pages of DNA Analyser tools, and its implementation on the web-based server allows effective analyses not only for DNA segments but also for full chromosomes and genomes.
- Keywords
- RNA–DNA hybrid, non-B structure, sequence analysis,
- MeSH
- Algorithms * MeSH
- DNA chemistry genetics MeSH
- Genomics methods MeSH
- Internet statistics & numerical data MeSH
- Humans MeSH
- Genomic Instability * MeSH
- R-Loop Structures * MeSH
- Software MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA MeSH
PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of local chemical neighborhood ligandability centered on points placed on a solvent-accessible protein surface. Points with a high ligandability score are then clustered to form the resulting ligand binding sites. In addition, PrankWeb provides a web interface enabling users to easily carry out the prediction and visually inspect the predicted binding sites via an integrated sequence-structure view. Moreover, PrankWeb can determine sequence conservation for the input molecule and use this in both the prediction and result visualization steps. Alongside its online visualization options, PrankWeb also offers the possibility of exporting the results as a PyMOL script for offline visualization. The web frontend communicates with the server side via a REST API. In high-throughput scenarios, therefore, users can utilize the server API directly, bypassing the need for a web-based frontend or installation of the P2Rank application. PrankWeb is available at http://prankweb.cz/, while the web application source code and the P2Rank method can be accessed at https://github.com/jendelel/PrankWebApp and https://github.com/rdk/p2rank, respectively.
- MeSH
- Benchmarking MeSH
- Datasets as Topic MeSH
- Protein Interaction Domains and Motifs MeSH
- Internet MeSH
- Protein Conformation, alpha-Helical MeSH
- Protein Conformation, beta-Strand MeSH
- Humans MeSH
- Ligands MeSH
- Proteins chemistry metabolism MeSH
- Amino Acid Sequence MeSH
- Software * MeSH
- Machine Learning * MeSH
- Thermodynamics MeSH
- Protein Binding MeSH
- Binding Sites MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Ligands MeSH
- Proteins MeSH
The article deals with and discusses two main approaches in building semantic structures for electrophysiological metadata. It is the use of conventional data structures, repositories, and programming languages on one hand and the use of formal representations of ontologies, known from knowledge representation, such as description logics or semantic web languages on the other hand. Although knowledge engineering offers languages supporting richer semantic means of expression and technological advanced approaches, conventional data structures and repositories are still popular among developers, administrators and users because of their simplicity, overall intelligibility, and lower demands on technical equipment. The choice of conventional data resources and repositories, however, raises the question of how and where to add semantics that cannot be naturally expressed using them. As one of the possible solutions, this semantics can be added into the structures of the programming language that accesses and processes the underlying data. To support this idea we introduced a software prototype that enables its users to add semantically richer expressions into a Java object-oriented code. This approach does not burden users with additional demands on programming environment since reflective Java annotations were used as an entry for these expressions. Moreover, additional semantics need not to be written by the programmer directly to the code, but it can be collected from non-programmers using a graphic user interface. The mapping that allows the transformation of the semantically enriched Java code into the Semantic Web language OWL was proposed and implemented in a library named the Semantic Framework. This approach was validated by the integration of the Semantic Framework in the EEG/ERP Portal and by the subsequent registration of the EEG/ERP Portal in the Neuroscience Information Framework.
- Keywords
- EEG/ERP portal, electrophysiology, object-oriented code, ontology, semantic framework, semantic web,
- Publication type
- Journal Article MeSH