Identification and classification of DICOM files with burned-in text content
Jazyk angličtina Země Irsko Médium print-electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
31029254
DOI
10.1016/j.ijmedinf.2019.02.011
PII: S1386-5056(19)30202-3
Knihovny.cz E-zdroje
- Klíčová slova
- Burned-in protected health information, Classification, DICOM, De-identification, HIPAA, Text detection,
- MeSH
- algoritmy MeSH
- datové soubory jako téma MeSH
- důvěrnost informací MeSH
- elektronické zdravotní záznamy MeSH
- lidé MeSH
- soukromí * MeSH
- zabezpečení počítačových systémů * MeSH
- zákon o převoditelnosti a povinném vyúčtování zdravotního pojištění (USA) MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Spojené státy americké MeSH
BACKGROUND: Protected health information burned in pixel data is not indicated for various reasons in DICOM. It complicates the secondary use of such data. In recent years, there have been several attempts to anonymize or de-identify DICOM files. Existing approaches have different constraints. No completely reliable solution exists. Especially for large datasets, it is necessary to quickly analyse and identify files potentially violating privacy. METHODS: Classification is based on adaptive-iterative algorithm designed to identify one of three classes. There are several image transformations, optical character recognition, and filters; then a local decision is made. A confirmed local decision is the final one. The classifier was trained on a dataset composed of 15,334 images of various modalities. RESULTS: The false positive rates are in all cases below 4.00%, and 1.81% in the mission-critical problem of detecting protected health information. The classifier's weighted average recall was 94.85%, the weighted average inverse recall was 97.42% and Cohen's Kappa coefficient was 0.920. CONCLUSION: The proposed novel approach for classification of burned-in text is highly configurable and able to analyse images from different modalities with a noisy background. The solution was validated and is intended to identify DICOM files that need to have restricted access or be thoroughly de-identified due to privacy issues. Unlike with existing tools, the recognised text, including its coordinates, can be further used for de-identification.
Citace poskytuje Crossref.org