Documenting the semantics of medical data
Dotaz
Zobrazit nápovědu
OBJECTIVES: Our main objective is to design a method of, and supporting software for, interactive correction and semantic annotation of narrative clinical reports, which would allow for their easier and less erroneous processing outside their original context: first, by physicians unfamiliar with the original language (and possibly also the source specialty), and second, by tools requiring structured information, such as decision-support systems. Our additional goal is to gain insights into the process of narrative report creation, including the errors and ambiguities arising therein, and also into the process of report annotation by clinical terms. Finally, we also aim to provide a dataset of ground-truth transformations (specific for Czech as the source language), set up by expert physicians, which can be reused in the future for subsequent analytical studies and for training automated transformation procedures. METHODS: A three-phase preprocessing method has been developed to support secondary use of narrative clinical reports in electronic health record. Narrative clinical reports are narrative texts of healthcare documentation often stored in electronic health records. In the first phase a narrative clinical report is tokenized. In the second phase the tokenized clinical report is normalized. The normalized clinical report is easily readable for health professionals with the knowledge of the language used in the narrative clinical report. In the third phase the normalized clinical report is enriched with extracted structured information. The final result of the third phase is a semi-structured normalized clinical report where the extracted clinical terms are matched to codebook terms. Software tools for interactive correction, expansion and semantic annotation of narrative clinical reports has been developed and the three-phase preprocessing method validated in the cardiology area. RESULTS: The three-phase preprocessing method was validated on 49 anonymous Czech narrative clinical reports in the field of cardiology. Descriptive statistics from the database of accomplished transformations has been calculated. Two cardiologists participated in the annotation phase. The first cardiologist annotated 1500 clinical terms found in 49 narrative clinical reports to codebook terms using the classification systems ICD 10, SNOMED CT, LOINC and LEKY. The second cardiologist validated annotations of the first cardiologist. The correct clinical terms and the codebook terms have been stored in a database. CONCLUSIONS: We extracted structured information from Czech narrative clinical reports by the proposed three-phase preprocessing method and linked it to electronic health records. The software tool, although generic, is tailored for Czech as the specific language of electronic health record pool under study. This will provide a potential etalon for porting this approach to dozens of other less-spoken languages. Structured information can support medical decision making, quality assurance tasks and further medical research.
- Klíčová slova
- Narrative clinical report, classification systems, electronic health record, nomenclatures, structured information, tokens,
- MeSH
- elektronické zdravotní záznamy normy MeSH
- mezinárodní klasifikace nemocí MeSH
- psaní normy MeSH
- řízený slovník * MeSH
- sémantika * MeSH
- směrnice jako téma MeSH
- smysluplné využití normy MeSH
- software MeSH
- správnost dat MeSH
- strojové učení * MeSH
- uživatelské rozhraní počítače MeSH
- zpracování přirozeného jazyka * MeSH
- zpracování textu normy MeSH
- Publikační typ
- časopisecké články MeSH
BACKGROUND: There is an increasing trend to represent domain knowledge in structured graphs, which provide efficient knowledge representations for many downstream tasks. Knowledge graphs are widely used to model prior knowledge in the form of nodes and edges to represent semantically connected knowledge entities, which several works have adopted into different medical imaging applications. METHODS: We systematically searched over five databases to find relevant articles that applied knowledge graphs to medical imaging analysis. After screening, evaluating, and reviewing the selected articles, we performed a systematic analysis. RESULTS: We looked at four applications in medical imaging analysis, including disease classification, disease localization and segmentation, report generation, and image retrieval. We also identified limitations of current work, such as the limited amount of available annotated data and weak generalizability to other tasks. We further identified the potential future directions according to the identified limitations, including employing semisupervised frameworks to alleviate the need for annotated data and exploring task-agnostic models to provide better generalizability. CONCLUSIONS: We hope that our article will provide the readers with aggregated documentation of the state-of-the-art knowledge graph applications for medical imaging to encourage future research.
- Publikační typ
- časopisecké články MeSH
- scoping review MeSH
AI development in biotechnology relies on high-quality data to train and validate algorithms. The FAIR principles (Findable, Accessible, Interoperable, and Reusable) and regulatory frameworks such as the In Vitro Diagnostic Regulation (IVDR) and the Medical Device Regulation (MDR) specify requirements on specimen and data provenance to ensure the quality and traceability of data used in AI development. In this paper, a framework is presented for recording and publishing provenance information to meet these requirements. The framework is based on the use of standardized models and protocols, such as the W3C PROV model and the ISO 23494 series, to capture and record provenance information at various stages of the data generation and analysis process. The framework and use case illustrate the role of provenance information in supporting the development of high-quality AI algorithms in biotechnology. Finally, the principles of the framework are illustrated in a simple computational pathology use case, showing how specimen and data provenance can be used in the development and documentation of an AI algorithm. The use case demonstrates the importance of managing and integrating distributed provenance information and highlights the complex task of considering factors such as semantic interoperability, confidentiality, and the verification of authenticity and integrity.
- Klíčová slova
- Artificial intelligence, Biological material, Provenance, Traceability,
- MeSH
- algoritmy * MeSH
- biotechnologie * MeSH
- umělá inteligence MeSH
- Publikační typ
- časopisecké články MeSH