Anomaly Detection Algorithm for Real-World Data and Evidence in Clinical Research: Implementation, Evaluation, and Validation Study

. 2021 May 07 ; 9 (5) : e27172. [epub] 20210507

Status PubMed-not-MEDLINE Jazyk angličtina Země Kanada Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid33851576
Odkazy

PubMed 33851576
PubMed Central PMC8140384
DOI 10.2196/27172
PII: v9i5e27172
Knihovny.cz E-zdroje

BACKGROUND: Statistical analysis, which has become an integral part of evidence-based medicine, relies heavily on data quality that is of critical importance in modern clinical research. Input data are not only at risk of being falsified or fabricated, but also at risk of being mishandled by investigators. OBJECTIVE: The urgent need to assure the highest data quality possible has led to the implementation of various auditing strategies designed to monitor clinical trials and detect errors of different origin that frequently occur in the field. The objective of this study was to describe a machine learning-based algorithm to detect anomalous patterns in data created as a consequence of carelessness, systematic error, or intentionally by entering fabricated values. METHODS: A particular electronic data capture (EDC) system, which is used for data management in clinical registries, is presented including its architecture and data structure. This EDC system features an algorithm based on machine learning designed to detect anomalous patterns in quantitative data. The detection algorithm combines clustering with a series of 7 distance metrics that serve to determine the strength of an anomaly. For the detection process, the thresholds and combinations of the metrics were used and the detection performance was evaluated and validated in the experiments involving simulated anomalous data and real-world data. RESULTS: Five different clinical registries related to neuroscience were presented-all of them running in the given EDC system. Two of the registries were selected for the evaluation experiments and served also to validate the detection performance on an independent data set. The best performing combination of the distance metrics was that of Canberra, Manhattan, and Mahalanobis, whereas Cosine and Chebyshev metrics had been excluded from further analysis due to the lowest performance when used as single distance metric-based classifiers. CONCLUSIONS: The experimental results demonstrate that the algorithm is universal in nature, and as such may be implemented in other EDC systems, and is capable of anomalous data detection with a sensitivity exceeding 85%.

Zobrazit více v PubMed

Solomon DJ, Henry RC, Hogan JG, Van Amburg GH, Taylor J. Evaluation and implementation of public health registries. Public Health Rep. 1991;106(2):142–50. PubMed PMC

Hoque DME, Kumari V, Hoque M, Ruseckaite R, Romero L, Evans SM. Impact of clinical registries on quality of patient care and clinical outcomes: A systematic review. PLoS One. 2017 Sep 8;12(9):e0183667. doi: 10.1371/journal.pone.0183667. PubMed DOI PMC

Lu Z. Technical challenges in designing post-marketing eCRFs to address clinical safety and pharmacovigilance needs. Contemp Clin Trials. 2010 Jan;31(1):108–18. doi: 10.1016/j.cct.2009.11.004. PubMed DOI

Arts Danielle G T, De Keizer Nicolette F, Scheffer Gert-Jan. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc. 2002;9(6):600–11. doi: 10.1197/jamia.m1087. PubMed DOI PMC

O’Reilly GM, Gabbe B, Moore L, Cameron PA. Classifying, measuring and improving the quality of data in trauma registries: A review of the literature. Injury. 2016 Mar;47(3):559–567. doi: 10.1016/j.injury.2016.01.007. PubMed DOI

Houston L, Probst Y, Martin A. Assessing data quality and the variability of source data verification auditing methods in clinical research settings. Journal of Biomedical Informatics. 2018 Jul;83:25–32. doi: 10.1016/j.jbi.2018.05.010. PubMed DOI

Timmermans C, Doffagne E, Venet D, Desmet L, Legrand C, Burzykowski T, Buyse M. Statistical monitoring of data quality and consistency in the Stomach Cancer Adjuvant Multi-institutional Trial Group Trial. Gastric Cancer. 2015 Aug 23;19(1):24–30. doi: 10.1007/s10120-015-0533-9. PubMed DOI

George SL, Buyse M. Data fraud in clinical trials. Clinical Investigation. 2015 Feb;5(2):161–173. doi: 10.4155/cli.14.116. PubMed DOI PMC

Walther B, Hossin S, Townend J, Abernethy N, Parker D, Jeffries D. Comparison of electronic data capture (EDC) with the standard data capture method for clinical trial data. PLoS One. 2011;6(9):e25348. doi: 10.1371/journal.pone.0025348. PubMed DOI PMC

van Dam J, Omondi Onyango K, Midamba B, Groosman N, Hooper N, Spector J, Pillai G(, Ogutu B. Open-source mobile digital platform for clinical trial data collection in low-resource settings. BMJ Innov. 2017 Jan 06;3(1):26–31. doi: 10.1136/bmjinnov-2016-000164. PubMed DOI PMC

Gazali. Kaur S, Singh I. Artificial intelligence based clinical data management systems: A review. Informatics in Medicine Unlocked. 2017;9:219–229. doi: 10.1016/j.imu.2017.09.003. DOI

Bruland P, Doods J, Brix T, Dugas M, Storck M. Connecting healthcare and clinical research: Workflow optimizations through seamless integration of EHR, pseudonymization services and EDC systems. International Journal of Medical Informatics. 2018 Nov;119:103–108. doi: 10.1016/j.ijmedinf.2018.09.007. PubMed DOI

Zhengwu Lu Electronic Data-Capturing Technology for Clinical Trials: Experience with a Global Postmarketing Study. IEEE Eng. Med. Biol. Mag. 2010 Mar;29(2):95–102. doi: 10.1109/memb.2009.935726. PubMed DOI

Brandt CA, Argraves S, Money R, Ananth G, Trocky NM, Nadkarni PM. Informatics tools to improve clinical research study implementation. Contemporary Clinical Trials. 2006 Apr;27(2):112–122. doi: 10.1016/j.cct.2005.11.013. PubMed DOI

Gaspar J, Catumbela E, Marques B, Freitas A. Systematic review of outliers detection techniques in medical data - preliminary study. Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2011); HEALTHINF; 2011; Rome, Italy. 2011. pp. 575–582. DOI

Sakamoto J. A Hercule Poirot of clinical research. Gastric Cancer. 2015 Oct 19;19(1):21–23. doi: 10.1007/s10120-015-0555-3. PubMed DOI

Lei D, Zhu Q, Chen J, Lin H, Yang P. Automatic K-Means Clustering Algorithm for Outlier Detection. Information Engineering and Applications. Lecture Notes in Electrical Engineering. 2012;154:363–372. doi: 10.1007/978-1-4471-2386-6_47. DOI

Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–35. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. PubMed DOI

Smiti A. When machine learning meets medical world: Current status and future challenges. Computer Science Review. 2020 Aug;37:100280. doi: 10.1016/j.cosrev.2020.100280. DOI

Knepper D, Lindblad AS, Sharma G, Gensler GR, Manukyan Z, Matthews AG, Seifu Y. Statistical Monitoring in Clinical Trials: Best Practices for Detecting Data Anomalies Suggestive of Fabrication or Misconduct. Ther Innov Regul Sci. 2016 Dec 30;50(2):144–154. doi: 10.1177/2168479016630576. PubMed DOI

Pimentel MA, Clifton DA, Clifton L, Tarassenko L. A review of novelty detection. Signal Processing. 2014 Jun;99:215–249. doi: 10.1016/j.sigpro.2013.12.026. PubMed DOI PMC

Karczmarek P, Kiersztyn A, Pedrycz W, Al E. K-Means-based isolation forest. Knowledge-Based Systems. 2020 May;195:105659. doi: 10.1016/j.knosys.2020.105659. DOI

Koufakou A, Georgiopoulos M. A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min Knowl Disc. 2009 Nov 11;20(2):259–289. doi: 10.1007/s10618-009-0148-z. DOI

Estiri H, Klann JG, Murphy SN. A clustering approach for detecting implausible observation values in electronic health records data. BMC Med Inform Decis Mak. 2019 Jul 23;19(1):1–16. doi: 10.1186/s12911-019-0852-6. PubMed DOI PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

3D printing traceability in healthcare using 3Diamond software

. 2024 Jun 30 ; 10 (12) : e32664. [epub] 20240611

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...