-
Something wrong with this record ?
Identification and classification of DICOM files with burned-in text content
P. Vcelak, M. Kryl, M. Kratochvil, J. Kleckova,
Language English Country Ireland
Document type Journal Article, Research Support, Non-U.S. Gov't
- MeSH
- Algorithms MeSH
- Datasets as Topic MeSH
- Confidentiality MeSH
- Electronic Health Records MeSH
- Humans MeSH
- Privacy * MeSH
- Computer Security * MeSH
- Health Insurance Portability and Accountability Act MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- United States MeSH
BACKGROUND: Protected health information burned in pixel data is not indicated for various reasons in DICOM. It complicates the secondary use of such data. In recent years, there have been several attempts to anonymize or de-identify DICOM files. Existing approaches have different constraints. No completely reliable solution exists. Especially for large datasets, it is necessary to quickly analyse and identify files potentially violating privacy. METHODS: Classification is based on adaptive-iterative algorithm designed to identify one of three classes. There are several image transformations, optical character recognition, and filters; then a local decision is made. A confirmed local decision is the final one. The classifier was trained on a dataset composed of 15,334 images of various modalities. RESULTS: The false positive rates are in all cases below 4.00%, and 1.81% in the mission-critical problem of detecting protected health information. The classifier's weighted average recall was 94.85%, the weighted average inverse recall was 97.42% and Cohen's Kappa coefficient was 0.920. CONCLUSION: The proposed novel approach for classification of burned-in text is highly configurable and able to analyse images from different modalities with a noisy background. The solution was validated and is intended to identify DICOM files that need to have restricted access or be thoroughly de-identified due to privacy issues. Unlike with existing tools, the recognised text, including its coordinates, can be further used for de-identification.
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc19044860
- 003
- CZ-PrNML
- 005
- 20200114150836.0
- 007
- ta
- 008
- 200109s2019 ie f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1016/j.ijmedinf.2019.02.011 $2 doi
- 035 __
- $a (PubMed)31029254
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a ie
- 100 1_
- $a Vcelak, Petr $u NTIS - New Technologies for the Information Society, University of West Bohemia, Univerzitni 8, 30614 Plzen, Czech Republic. Electronic address: vcelak@kiv.zcu.cz.
- 245 10
- $a Identification and classification of DICOM files with burned-in text content / $c P. Vcelak, M. Kryl, M. Kratochvil, J. Kleckova,
- 520 9_
- $a BACKGROUND: Protected health information burned in pixel data is not indicated for various reasons in DICOM. It complicates the secondary use of such data. In recent years, there have been several attempts to anonymize or de-identify DICOM files. Existing approaches have different constraints. No completely reliable solution exists. Especially for large datasets, it is necessary to quickly analyse and identify files potentially violating privacy. METHODS: Classification is based on adaptive-iterative algorithm designed to identify one of three classes. There are several image transformations, optical character recognition, and filters; then a local decision is made. A confirmed local decision is the final one. The classifier was trained on a dataset composed of 15,334 images of various modalities. RESULTS: The false positive rates are in all cases below 4.00%, and 1.81% in the mission-critical problem of detecting protected health information. The classifier's weighted average recall was 94.85%, the weighted average inverse recall was 97.42% and Cohen's Kappa coefficient was 0.920. CONCLUSION: The proposed novel approach for classification of burned-in text is highly configurable and able to analyse images from different modalities with a noisy background. The solution was validated and is intended to identify DICOM files that need to have restricted access or be thoroughly de-identified due to privacy issues. Unlike with existing tools, the recognised text, including its coordinates, can be further used for de-identification.
- 650 _2
- $a algoritmy $7 D000465
- 650 12
- $a zabezpečení počítačových systémů $7 D016494
- 650 _2
- $a důvěrnost informací $7 D003219
- 650 _2
- $a datové soubory jako téma $7 D066264
- 650 _2
- $a elektronické zdravotní záznamy $7 D057286
- 650 _2
- $a zákon o převoditelnosti a povinném vyúčtování zdravotního pojištění (USA) $7 D020408
- 650 _2
- $a lidé $7 D006801
- 650 12
- $a soukromí $7 D018907
- 651 _2
- $a Spojené státy americké $7 D014481
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 700 1_
- $a Kryl, Martin $u NTIS - New Technologies for the Information Society, University of West Bohemia, Univerzitni 8, 30614 Plzen, Czech Republic. Electronic address: kryl@kiv.zcu.cz.
- 700 1_
- $a Kratochvil, Michal $u NTIS - New Technologies for the Information Society, University of West Bohemia, Univerzitni 8, 30614 Plzen, Czech Republic. Electronic address: zmk@kiv.zcu.cz.
- 700 1_
- $a Kleckova, Jana $u NTIS - New Technologies for the Information Society, University of West Bohemia, Univerzitni 8, 30614 Plzen, Czech Republic. Electronic address: kleckova@kiv.zcu.cz.
- 773 0_
- $w MED00002340 $t International journal of medical informatics $x 1386-5056 $g Roč. 126, č. - (2019), s. 128-137
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/31029254 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y a $z 0
- 990 __
- $a 20200109 $b ABA008
- 991 __
- $a 20200114151209 $b ABA008
- 999 __
- $a ok $b bmc $g 1483129 $s 1083533
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2019 $b 126 $c - $d 128-137 $e 20190301 $i 1386-5056 $m International journal of medical informatics $n Int J Med Inform $x MED00002340
- LZP __
- $a Pubmed-20200109