Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.
- MeSH
- Algorithms * MeSH
- Image Processing, Computer-Assisted * MeSH
- Semantics MeSH
- Machine Learning MeSH
- Publication type
- Journal Article MeSH
- Review MeSH
This paper provides an overview of current linguistic and ontological challenges which have to be met in order to provide full support to the transformation of health ecosystems in order to meet precision medicine (5 PM) standards. It highlights both standardization and interoperability aspects regarding formal, controlled representations of clinical and research data, requirements for smart support to produce and encode content in a way that humans and machines can understand and process it. Starting from the current text-centered communication practices in healthcare and biomedical research, it addresses the state of the art in information extraction using natural language processing (NLP). An important aspect of the language-centered perspective of managing health data is the integration of heterogeneous data sources, employing different natural languages and different terminologies. This is where biomedical ontologies, in the sense of formal, interchangeable representations of types of domain entities come into play. The paper discusses the state of the art of biomedical ontologies, addresses their importance for standardization and interoperability and sheds light to current misconceptions and shortcomings. Finally, the paper points out next steps and possible synergies of both the field of NLP and the area of Applied Ontology and Semantic Web to foster data interoperability for 5 PM.
- Publication type
- Journal Article MeSH
The concept of Data Management Plan (DMP) has emerged as a fundamental tool to help researchers through the systematical management of data. The Research Data Alliance DMP Common Standard (DCS) working group developed a set of universal concepts characterising a DMP so it can be represented as a machine-actionable artefact, i.e., machine-actionable Data Management Plan (maDMP). The technology-agnostic approach of the current maDMP specification: (i) does not explicitly link to related data models or ontologies, (ii) has no standardised way to describe controlled vocabularies, and (iii) is extensible but has no clear mechanism to distinguish between the core specification and its extensions.This paper reports on a community effort to create the DMP Common Standard Ontology (DCSO) as a serialisation of the DCS core concepts, with a particular focus on a detailed description of the components of the ontology. Our initial result shows that the proposed DCSO can become a suitable candidate for a reference serialisation of the DMP Common Standard.
To yet, the research process for producing, evaluating, and applying information has gotten less attention from informatics than the operational processes for performing clinical trials. The study protocol, which is an abstract description of a clinical study's scientific design, is at the heart of these scientific procedures - the science of clinical research. The Ontology of Clinical Trials is an OWL 2 model of the relationships between entities of study design protocols intended to aid in the design and analysis of human studies computationally. The modelling done by OCRe is not dependent on any particular study design or therapeutic domain. It features a study design typology as well as an ERGO Annotation dedicated module for expressing the significance of eligibility criteria. In this work, we outline the major informatics use cases at each phase of a study's scientific lifecycle, introduce OCRe and the ideas that underpin its modelling, and discuss how OCRe and related technologies can be applied to a variety of clinical research use cases. OCRe encapsulates the central semantics that underpins clinical research scientific procedures and can be used as an informatics foundation to support the whole spectrum of knowledge activities that make up clinical research science.
Living cell segmentation from bright-field light microscopy images is challenging due to the image complexity and temporal changes in the living cells. Recently developed deep learning (DL)-based methods became popular in medical and microscopy image segmentation tasks due to their success and promising outcomes. The main objective of this paper is to develop a deep learning, U-Net-based method to segment the living cells of the HeLa line in bright-field transmitted light microscopy. To find the most suitable architecture for our datasets, a residual attention U-Net was proposed and compared with an attention and a simple U-Net architecture. The attention mechanism highlights the remarkable features and suppresses activations in the irrelevant image regions. The residual mechanism overcomes with vanishing gradient problem. The Mean-IoU score for our datasets reaches 0.9505, 0.9524, and 0.9530 for the simple, attention, and residual attention U-Net, respectively. The most accurate semantic segmentation results was achieved in the Mean-IoU and Dice metrics by applying the residual and attention mechanisms together. The watershed method applied to this best - Residual Attention - semantic segmentation result gave the segmentation with the specific information for each cell.
Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles-mostly from biology and medicine-applicable to the COVID-19 crisis. Our study employs a combination of state-of-the-art machine learning techniques for text understanding, including embeddings-based language model BERT, several systems for detection and semantic expansion of entities: ConceptNet, Pubtator and ScispaCy. To interpret the resulting models, we use several explanation algorithms: random forest feature importance, LIME, and Shapley values. We compare the performance and comprehensibility of models obtained by "black-box" machine learning algorithms (neural networks and random forests) with models built with rule learning (CORELS, CBA), which are intrinsically explainable. Multiple rules were discovered, which referred to biomedical entities of potential interest. Of the rules with the highest lift measure, several rules pointed to dipeptidyl peptidase4 (DPP4), a known MERS-CoV receptor and a critical determinant of camel to human transmission of the camel coronavirus (MERS-CoV). Some other interesting patterns related to the type of animal investigated were found. Articles referring to bats and camels tend to draw citations, while articles referring to most other animal species related to coronavirus are lowly cited. Bat coronavirus is the only other virus from a non-human species in the betaB clade along with the SARS-CoV and SARS-CoV-2 viruses. MERS-CoV is in a sister betaC clade, also close to human SARS coronaviruses. Thus both species linked to high citation counts harbor coronaviruses which are more phylogenetically similar to human SARS viruses. On the other hand, feline (FIPV, FCOV) and canine coronaviruses (CCOV) are in the alpha coronavirus clade and more distant from the betaB clade with human SARS viruses. Other results include detection of apparent citation bias favouring authors with western sounding names. Equal performance of TF-IDF weights and binary word incidence matrix was observed, with the latter resulting in better interpretability. The best predictive performance was obtained with a "black-box" method-neural network. The rule-based models led to most insights, especially when coupled with text representation using semantic entity detection methods. Follow-up work should focus on the analysis of citation patterns in the context of phylogenetic trees, as well on patterns referring to DPP4, which is currently considered as a SARS-Cov-2 therapeutic target.
- Publication type
- Journal Article MeSH
Verbal communication relies heavily upon mutual understanding, or common ground. Inferring the intentional states of our interaction partners is crucial in achieving this, and social neuroscience has begun elucidating the intra- and inter-personal neural processes supporting such inferences. Typically, however, neuroscientific paradigms lack the reciprocal to-and-fro characteristic of social communication, offering little insight into the way these processes operate online during real-world interaction. In the present study, we overcame this by developing a "hyperscanning" paradigm in which pairs of interactants could communicate verbally with one another in a joint-action task whilst both undergoing functional magnetic resonance imaging simultaneously. Successful performance on this task required both interlocutors to predict their partner's upcoming utterance in order to converge on the same word as each other over recursive exchanges, based only on one another's prior verbal expressions. By applying various levels of analysis to behavioural and neuroimaging data acquired from 20 dyads, three principal findings emerged: First, interlocutors converged frequently within the same semantic space, suggesting that mutual understanding had been established. Second, assessing the brain responses of each interlocutor as they planned their upcoming utterances on the basis of their co-player's previous word revealed the engagement of the temporo-parietal junctional (TPJ), precuneus and dorso-lateral pre-frontal cortex. Moreover, responses in the precuneus were modulated positively by the degree of semantic convergence achieved on each round. Second, effective connectivity among these regions indicates the crucial role of the right TPJ in this process, consistent with the Nexus model. Third, neural signals within certain nodes of this network became aligned between interacting interlocutors. We suggest this reflects an interpersonal neural process through which interactants infer and align to one another's intentional states whilst they establish a common ground.
- MeSH
- Adult MeSH
- Humans MeSH
- Magnetic Resonance Imaging MeSH
- Young Adult MeSH
- Brain physiology MeSH
- Neuroimaging methods MeSH
- Image Processing, Computer-Assisted methods MeSH
- Social Behavior * MeSH
- Social Interaction * MeSH
- Verbal Behavior physiology MeSH
- Check Tag
- Adult MeSH
- Humans MeSH
- Young Adult MeSH
- Male MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
The wide-spread use of Common Data Models and information models in biomedical informatics encourages assumptions that those models could provide the entirety of what is needed for knowledge representation purposes. Based on the lack of computable semantics in frequently used Common Data Models, there appears to be a gap between knowledge representation requirements and these models. In this use-case oriented approach, we explore how a system-theoretic, architecture-centric, ontology-based methodology can help to better understand this gap. We show how using the Generic Component Model helps to analyze the data management system in a way that allows accounting for data management procedures inside the system and knowledge representation of the real world at the same time.
- MeSH
- Biological Ontologies * MeSH
- Data Management MeSH
- Semantics * MeSH
- Publication type
- Journal Article MeSH