• This record comes from PubMed

Measuring diversity in medical reports based on categorized attributes and international classification systems

. 2012 Apr 12 ; 12 () : 31. [epub] 20120412

Language English Country Great Britain, England Media electronic

Document type Journal Article, Research Support, Non-U.S. Gov't

BACKGROUND: Narrative medical reports do not use standardized terminology and often bring insufficient information for statistical processing and medical decision making. Objectives of the paper are to propose a method for measuring diversity in medical reports written in any language, to compare diversities in narrative and structured medical reports and to map attributes and terms to selected classification systems. METHODS: A new method based on a general concept of f-diversity is proposed for measuring diversity of medical reports in any language. The method is based on categorized attributes recorded in narrative or structured medical reports and on international classification systems. Values of categories are expressed by terms. Using SNOMED CT and ICD 10 we are mapping attributes and terms to predefined codes. We use f-diversities of Gini-Simpson and Number of Categories types to compare diversities of narrative and structured medical reports. The comparison is based on attributes selected from the Minimal Data Model for Cardiology (MDMC). RESULTS: We compared diversities of 110 Czech narrative medical reports and 1119 Czech structured medical reports. Selected categorized attributes of MDMC had mostly different numbers of categories and used different terms in narrative and structured reports. We found more than 60% of MDMC attributes in SNOMED CT. We showed that attributes in narrative medical reports had greater diversity than the same attributes in structured medical reports. Further, we replaced each value of category (term) used for attributes in narrative medical reports by the closest term and the category used in MDMC for structured medical reports. We found that relative Gini-Simpson diversities in structured medical reports were significantly smaller than those in narrative medical reports except the "Allergy" attribute. CONCLUSIONS: Terminology in narrative medical reports is not standardized. Therefore it is nearly impossible to map values of attributes (terms) to codes of known classification systems. A high diversity in narrative medical reports terminology leads to more difficult computer processing than in structured medical reports and some information may be lost during this process. Setting a standardized terminology would help healthcare providers to have complete and easily accessible information about patients that would result in better healthcare.

See more in PubMed

World Health Organization. International Classification of Diseases (ICD) ©2011, homepage available at [ http://www.who.int/classifications/icd/en/] (last accessed October 10, 2011)

International Classification of Diseases and Related Health Problems. The Tenth Revisions. Instructing Manual. ÚZIS ČR. (In Czech)

Stausberg J, Lehmann N, Kaczmarek D, Stein M. Reliability of diagnose coding with ICD-10. Int J Med Inform. 2008;77:50–57. doi: 10.1016/j.ijmedinf.2006.11.005. PubMed DOI

The International Health Terminology Standards Development Organisation. SNOMED Clinical Terms. ®, homepage available at [ http://www.ihtsdo.org/snomed-ct/] (last accessed October 10, 2011)

The International Health Terminology Standards Development Organisation. SNOMED Clinical Terms® User Guide. ©2002-2009, July 2009 International Release, 1-70.

Schulz S, Hanser S, Hahn U, Rodgers J. The semantics procedures and diseases in SNOMED® CT. Methods Inf Med. 2006;45:354–358. PubMed

Cornet R. Definitions and qualifiers in SNOMED CT. Methods Inf Med. 2009;48:177–183. PubMed

Lee D, Cornet R, Lau F. Implications of SNOMED CT versioning. Int J Med Inform. 2011;80:442–453. doi: 10.1016/j.ijmedinf.2011.02.006. PubMed DOI

Ceusters W. In: User Centered Networked Health Care. Moen A at al, editor. IOS Press; 2011. SNOMED CT's FR2: Is the Future Bright? pp. 829–833.

Conley E, Benson T. SNOMED CT: Who Needs to Know What? European Journal for Biomedical Informatics. 2011;7(2):40–47.

Park HA, Lundberg C, Coenen A, Konicek D. Evaluation of the content coverage of SNOMED CT representing ICNP seven-axis version 1 concepts. Methods Inf Med. 2011;50:472–478. doi: 10.3414/ME11-01-0004. PubMed DOI

Cornet R, de Keizer N. Forty years of SNOMED: a literature review. BMC Med Inform Decis Mak. 2008;8(Suppl 1):S2. doi: 10.1186/1472-6947-8-S1-S2. PubMed DOI PMC

U. S. National Library of Medicine; National Institutes of Health. Medical Subject Headings. Homepage available at [ http://www.nlm.nih.gov/mesh/] (last accessed October 10, 2011)

Gault LV, Schultz M. Variations in Medical Subject Headings (MeSH) mapping: from the natural language of patron terms to the controlled vocabulary of mapped lists. J Med Libr Assoc. 2002;90(2):173–180. PubMed PMC

Regenstrief Institute, Inc. Logical Observation Identifiers Names and Codes (LOINC®) ©1994-2011, homepage available at [ http://www.regenstrief.org/medinformatics/loinc/] (last accessed October 10, 2011)

Khan AN, Griffith SP, Moore C, Russell D, Rosario AC Jr, Bertolli J. Standardizing laboratory data by mapping to LOINC. J Am Med Inform Assoc. 2006;13(3):353–355. doi: 10.1197/jamia.M1935. PubMed DOI PMC

U.S. National Library of Medicine; National Institute of Health. Unified Medical Language System® (UMLS®) homepage available at [ http://www.nlm.nih.gov/pubs/factsheets/umls.html] (last accessed October 10, 2011)

Han S-B, Choi J. The comparative study on concept representation between the UMLS and the clinical terms in Korean Medical Records. Int J Med Inform. 2005;74:67–76. doi: 10.1016/j.ijmedinf.2004.09.004. PubMed DOI

Campbell JR, Olivek DE. Shortliffe. UMLS: towards a collaborative approcah for solving terminological problems. J Am Med Inform Assoc. 1998;5:12–16. doi: 10.1136/jamia.1998.0050012. PubMed DOI PMC

Massari P, Pereira S, Thirion B, Derdeville A, Darmoni SJ. Use of super-concepts to customize electronic medical records data display. Stud Health Technol Inform. 2008;136:845–850. PubMed

Meystre SM, Savova K, Klipper-Schuler C, Hurdle JF. Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research. IMIA Yearbook of Medical Informatics. 2008. pp. 128–144. PubMed

Liu K, Chapman WW, Savova G, Chute CG, Sioutos N, Crewley RS. Effectiveness of Lexico-syntactic Pattern Matchng for Ontology Enrichment with Clinical Documents. Methods of Information in Medicine. 2001;40(5):397–407. PubMed PMC

Eryiğit G, Nivre J, Oflazer K. Dependency parsing of Turkish. Computational Linguistics. 2008;34(3):357–389. doi: 10.1162/coli.2008.07-017-R1-06-83. DOI

Zvára K, Kašpar V. Identification of units and other terms in Czech medical records. European Journal for Biomedical Informatics. 2010;6(1):78–82.

Bleich HL, Slack WV. Reflections on electronic medical record: when doctor will use them and when they will not. Int J Med Inform. 2010;79:1–4. doi: 10.1016/j.ijmedinf.2009.10.002. PubMed DOI

Zvárová J. Biomedical Informatics Research and Education at the EuroMISE Center. IMIA Yearbook of Medical Informatics, Schattauer GmbH. 2006. pp. 166–173. PubMed

Adášková J, Anger Z, Aschermann M, Bencko V, Berka P, Filipovský J, Goláň L, Grus T, Grünfeldová H, Haas T, Hanuš P, Hanzlíček P, Holcátová I, Hrach K, Jiroušek R, Kejřová E, Kocmanová D, Kolář J, Kotásek P, Králíková E, Krupařová M, Kyloušková M, Malý M, Mareš R, Matoulek M, Mazura I, Mrázek V, Novotný L, Novotný Z, Pecen L, Peleška J, Prázný M, Pudil P, Rameš J, Rauch J, Reissigová J, Rosolová H, Rousková B, Říha A, Sedlak P, Slámová A, Somol P, Svačina Š, Svátek V, Šabík D, Šimek S, Škvor J, Špidlen J, Štochl J, Tomečková M, Umnerová V, Zvára K, Zvárová J. Internal research report of the EuroMISE Centre - Cardio. Prague: Institute of Computer Science AS CR; 2002. A proposal of the Minimal Data Model for Cardiology and the ADAMEK software application (in Czech)

Mareš R, Tomečková M, Peleška J, Hanzlíček P, Zvárová J. Interface of patient database systems - an example of the application designed for data collection in the framework of minimal data model for cardiology (in Czech) Cor Vasa. 2002;44(4 Suppl):76.

Gini C. Variabilità e Mutabilità. Studi Economico-Giuridici della R. Univ. di Cagliari. 3, 1912; Part 2 80.

Simpson EH. Measurement of diversity. Nature. 1949;163:688. doi: 10.1038/163688a0. DOI

Vajda I. Theory of statistical inference and information. Kluwer: Boston; 1989.

Zvarova J, Studeny M. Information theoretical approach to constitution and reduction of medical data. Int J Med Inf. 1997;45:65–74. doi: 10.1016/S1386-5056(97)00036-1. PubMed DOI

Peng H, Long F, Ding Ch. Feature selection based on mutual information: criteria of max-dependency, mas-relevance and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–1238. PubMed

Benish WA. Intuitive and axiomatic arguments for quantifying diagnostic test performance in units of information. Methods Inf Med. 2009;48:552–557. doi: 10.3414/ME0627. PubMed DOI

Blokh D, Zurgil N, Stambler I, Afrimzon E, Shafran Y, Korech E, Sandbank J, Deutsch M. An information-theoretical model for breast cancer detection. Methods Inf Med. 2008;47:322–557. PubMed

Zvárová J. On measures of statistical dependence. Casopis pro pestovani matematiky. 1974;99:15–29.

Zvárová J, Vajda I. On genetic information, diversity and distance. Methods Inf Med. 2006;2:173–179. PubMed

Patil GP, Tailie C. Diversity as a concept and its measurement. J Am Stat Assoc. 1982;77:548–561. doi: 10.2307/2287709. DOI

Zvárová J, Zvára K. In: Proceedings of the 6th Summer School on Computational Biology, Deterministic and Stochastic Modelling in Biology and Medicine. Hrebicek J, Holcik J, editor. Akademické nakladatelství CERM, Brno; 2010. Stochastic modelling of biodiversity: f-diversity, self f-diversity and marginal f-diversity; pp. 108–119.

Bonachela JA, Hinrichsen H, Munoz MA. Entropy estimates of small data sets. Journal of Physics A: Mathematical and Theoretical. 2009;41:1(11).

Lee DH, Lau FY, Juan H. A method for encoding clinical datasets with SNOMED CT. BMC Med Inform Decis Mak. 2010;10:53. doi: 10.1186/1472-6947-10-53. PubMed DOI PMC

Přečková P. Language of Czech medical reports and classification systems in medicine. European Journal for Biomedical Informatics. 2010;6(1):58–65.

Ringlestetter C, Schulz KU, Mihov S. Orthographic errors in web pages: toward cleaner web corpora. Computational Linguistics. 2006;32(3):295–340. doi: 10.1162/coli.2006.32.3.295. DOI

Ministry of Health of the Czech Republic (homepage on the internet), Data Standard of MH CR - DASTA and NCLP. http://ciselniky.dasta.mzcr.cz (last accessed October 10, 2011)

Institute of Health Information and Statistics of the Czech Republic (homepage on the internet) http://www.uzis.cz (last accessed October 10, 2011)

Health Level Seven, Inc. (homepage on the internet) Health Level 7. http://www.hl7.org (last accessed October 10, 2011)

European Committee for standardisation (CEN), Technical Committee CEN/TC251: European Standard EN 13606, "Health informatics - Electronic health record communication"

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...