• Something wrong with this record ?

Identification and classification of DICOM files with burned-in text content

P. Vcelak, M. Kryl, M. Kratochvil, J. Kleckova,

. 2019 ; 126 (-) : 128-137. [pub] 20190301

Language English Country Ireland

Document type Journal Article, Research Support, Non-U.S. Gov't

BACKGROUND: Protected health information burned in pixel data is not indicated for various reasons in DICOM. It complicates the secondary use of such data. In recent years, there have been several attempts to anonymize or de-identify DICOM files. Existing approaches have different constraints. No completely reliable solution exists. Especially for large datasets, it is necessary to quickly analyse and identify files potentially violating privacy. METHODS: Classification is based on adaptive-iterative algorithm designed to identify one of three classes. There are several image transformations, optical character recognition, and filters; then a local decision is made. A confirmed local decision is the final one. The classifier was trained on a dataset composed of 15,334 images of various modalities. RESULTS: The false positive rates are in all cases below 4.00%, and 1.81% in the mission-critical problem of detecting protected health information. The classifier's weighted average recall was 94.85%, the weighted average inverse recall was 97.42% and Cohen's Kappa coefficient was 0.920. CONCLUSION: The proposed novel approach for classification of burned-in text is highly configurable and able to analyse images from different modalities with a noisy background. The solution was validated and is intended to identify DICOM files that need to have restricted access or be thoroughly de-identified due to privacy issues. Unlike with existing tools, the recognised text, including its coordinates, can be further used for de-identification.

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc19044860
003      
CZ-PrNML
005      
20200114150836.0
007      
ta
008      
200109s2019 ie f 000 0|eng||
009      
AR
024    7_
$a 10.1016/j.ijmedinf.2019.02.011 $2 doi
035    __
$a (PubMed)31029254
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a ie
100    1_
$a Vcelak, Petr $u NTIS - New Technologies for the Information Society, University of West Bohemia, Univerzitni 8, 30614 Plzen, Czech Republic. Electronic address: vcelak@kiv.zcu.cz.
245    10
$a Identification and classification of DICOM files with burned-in text content / $c P. Vcelak, M. Kryl, M. Kratochvil, J. Kleckova,
520    9_
$a BACKGROUND: Protected health information burned in pixel data is not indicated for various reasons in DICOM. It complicates the secondary use of such data. In recent years, there have been several attempts to anonymize or de-identify DICOM files. Existing approaches have different constraints. No completely reliable solution exists. Especially for large datasets, it is necessary to quickly analyse and identify files potentially violating privacy. METHODS: Classification is based on adaptive-iterative algorithm designed to identify one of three classes. There are several image transformations, optical character recognition, and filters; then a local decision is made. A confirmed local decision is the final one. The classifier was trained on a dataset composed of 15,334 images of various modalities. RESULTS: The false positive rates are in all cases below 4.00%, and 1.81% in the mission-critical problem of detecting protected health information. The classifier's weighted average recall was 94.85%, the weighted average inverse recall was 97.42% and Cohen's Kappa coefficient was 0.920. CONCLUSION: The proposed novel approach for classification of burned-in text is highly configurable and able to analyse images from different modalities with a noisy background. The solution was validated and is intended to identify DICOM files that need to have restricted access or be thoroughly de-identified due to privacy issues. Unlike with existing tools, the recognised text, including its coordinates, can be further used for de-identification.
650    _2
$a algoritmy $7 D000465
650    12
$a zabezpečení počítačových systémů $7 D016494
650    _2
$a důvěrnost informací $7 D003219
650    _2
$a datové soubory jako téma $7 D066264
650    _2
$a elektronické zdravotní záznamy $7 D057286
650    _2
$a zákon o převoditelnosti a povinném vyúčtování zdravotního pojištění (USA) $7 D020408
650    _2
$a lidé $7 D006801
650    12
$a soukromí $7 D018907
651    _2
$a Spojené státy americké $7 D014481
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Kryl, Martin $u NTIS - New Technologies for the Information Society, University of West Bohemia, Univerzitni 8, 30614 Plzen, Czech Republic. Electronic address: kryl@kiv.zcu.cz.
700    1_
$a Kratochvil, Michal $u NTIS - New Technologies for the Information Society, University of West Bohemia, Univerzitni 8, 30614 Plzen, Czech Republic. Electronic address: zmk@kiv.zcu.cz.
700    1_
$a Kleckova, Jana $u NTIS - New Technologies for the Information Society, University of West Bohemia, Univerzitni 8, 30614 Plzen, Czech Republic. Electronic address: kleckova@kiv.zcu.cz.
773    0_
$w MED00002340 $t International journal of medical informatics $x 1386-5056 $g Roč. 126, č. - (2019), s. 128-137
856    41
$u https://pubmed.ncbi.nlm.nih.gov/31029254 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y a $z 0
990    __
$a 20200109 $b ABA008
991    __
$a 20200114151209 $b ABA008
999    __
$a ok $b bmc $g 1483129 $s 1083533
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2019 $b 126 $c - $d 128-137 $e 20190301 $i 1386-5056 $m International journal of medical informatics $n Int J Med Inform $x MED00002340
LZP    __
$a Pubmed-20200109

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...