Techniky strojového učení jsou metody, které umožní vytvořit z trénovací množiny případů model pro kategorie dat tak, že mohou být nové (neznámé) případy zařazeny do jedné nebo více kategorií schématem odpovídajícím modelu. Pro tento typ analýzy jsou velmi vhodná data ze studií sledujících určitou skupinu osob s opakovaným sběrem dat stejného typu. K vyhledávání znalostí z medicínských dat bylo užito různých algoritmů strojového učení. Bylo testováno několik algoritmů tak, aby bylo možno pokrýt většinu způsobů učení s učitelem. Byly provedeny dva typy pokusů. Jeden hledal vztahy mezi atributy, druhý testoval predikci budoucích příhod. Pro pokusy v tomto sdělení byla užita data z dvacet let trvající longitudinální primárně preventivní studie rizikových faktorů (RF) aterosklerózy u mužů středního věku. Studie se nazývá STULONG (LONGitudinal STUdy). Výsledky ukazují, že některé metody předpovídají některé poruchy lépe než jiné a že je tedy vhodné použít všechny algoritmy najednou a posuzovat spolehlivost výsledku na základě známého trendu každé metody. Algoritmy strojového učení byly také použity k předpovědi příčiny úmrtí. V tomto případě byly výsledky nevalné, pravděpodobně pro malé množství informace ve vstupních položkách v datového souboru.
Machine learning techniques are methods that given a training set of examples infer a model for the categories of the data, so that new (unknown) examples could be assigned to one or more categories by pattern matching within the model. The data from follow-up studies with repeated collection of the same type of data are very suitable for this analysis. Machine learning algorithms belonging to a variety of paradigms have been applied to knowledge discovery on medical data. All the used algorithms belong to the supervised learning paradigm. Several algorithms have been tested, trying to cover most of the kinds of supervised learning. Two kinds of experiments have been carried out. The first is intended to discover associations between attributes. The second kind is intended to test prediction of future disorders. For the experiments in this paper the data used was from the twenty years lasting primary preventive longitudinal study of the risk factors (RF) of atherosclerosis in middle aged men. Study is named STULONG (LONGitudinal STUdy). The results show that some methods predict some disorders better than others, so it is interesting to use all the algorithms at a time and consider the result confidence based upon the known tendency of each method. The machine learning algorithms have been also used in the prediction of death cause, obtaining poor results in this case, maybe due to the small amount of information (entries) of this type in the dataset.
- Keywords
- dobývání znalostí, strojové učení s učitelem, vytěžování z biomedicínských dat, rizikové faktory aterosklerózy,
- MeSH
- Algorithms MeSH
- Atherosclerosis diagnosis MeSH
- Databases, Factual MeSH
- Financing, Organized MeSH
- Middle Aged MeSH
- Humans MeSH
- Decision Support Techniques MeSH
- Prognosis MeSH
- Risk Factors MeSH
- Decision Support Systems, Clinical MeSH
- Information Storage and Retrieval MeSH
- Knowledge Bases MeSH
- Check Tag
- Middle Aged MeSH
- Humans MeSH
- Male MeSH
Digitalizace postupně proniká do velké části medicínských oblastí včetně patologie. Společně s digitálním zpracováním dat přichází aplikace metod umělé inteligence za účelem zjednodušení rutinních procesů, zvýšení bezpečnosti apod. Ačkoliv se obecné povědomí o metodách umělé inteligence zvyšuje, stále není pravidlem, že by odborníci z netechnických oborů měli detailní představu o tom, jak takové systémy fungují a jak se učí. Cílem tohoto textu je přístupnou formou vysvětlit základy strojového učení s využitím příkladů a ilustrací z oblasti digitální patologie. Nejedná se samozřejmě o ucelený přehled ani o představení nejmodernějších metod. Držíme se spíše úplných základů a představujeme fundamentální myšlenky, které stojí za většinou učících systémů, s použitím nejjednodušších modelů. V textu se věnujeme zejména rozhodovacím stromům, jejichž funkce je snadno vysvětlitelná, a elementárním neuronovým sítím, které jsou hlavním modelem používaným v dnešní umělé inteligenci. Pokusíme se také popsat postup spolupráce mezi lékaři, kteří dodávají data, a informatiky, kteří s jejich pomocí vytvářejí učící systémy. Věříme, že tento text pomůže překlenout rozdíly mezi znalostmi lékařů a informatiků a tím přispěje k efektivnější mezioborové spolupráci.
Digitalization has gradually made its way into many areas of medicine, including pathology. Along with digital data processing comes the application of artificial intelligence methods to simplify routine processes, enhance safety, etc. Although general awareness of artificial intelligence methods is increasing, it is still not common for professionals from non-technical fields to have a detailed understanding of how such systems work and learn. This text aims to explain the basics of machine learning in an accessible way using examples and illustrations from digital pathology. This is not intended to be a comprehensive overview or an introduction to cutting-edge methods. Instead, we use the simplest models to focus on fundamental concepts behind most learning systems. The text concentrates on decision trees, whose functionality is easy to explain, and basic neural networks, the primary models used in today’s artificial intelligence. We also attempt to describe the collaborative process between medical specialists, who provide the data, and computer scientists, who use this data to develop learning systems. This text will help bridge the knowledge gap between medical professionals and computer scientists, contributing to more effective interdisciplinary collaboration.
- MeSH
- Humans MeSH
- Pathology * trends MeSH
- Machine Learning * trends MeSH
- Artificial Intelligence trends MeSH
- Check Tag
- Humans MeSH
BACKGROUND AND OBJECTIVES: Research in Multiple Sclerosis (MS) has recently focused on extracting knowledge from real-world clinical data sources. This type of data is more abundant than data produced during clinical trials and potentially more informative about real-world clinical practice. However, this comes at the cost of less curated and controlled data sets. In this work we aim to predict disability progression by optimally extracting information from longitudinal patient data in the real-world setting, with a special focus on the sporadic sampling problem. METHODS: We use machine learning methods suited for patient trajectories modeling, such as recurrent neural networks and tensor factorization. A subset of 6682 patients from the MSBase registry is used. RESULTS: We can predict disability progression of patients in a two-year horizon with an ROC-AUC of 0.85, which represents a 32% decrease in the ranking pair error (1-AUC) compared to reference methods using static clinical features. CONCLUSIONS: Compared to the models available in the literature, this work uses the most complete patient history for MS disease progression prediction and represents a step forward towards AI-assisted precision medicine in MS.
- MeSH
- Humans MeSH
- Neural Networks, Computer MeSH
- Multiple Sclerosis * MeSH
- Machine Learning * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
Pathophysiological recordings of patients measured from various testing methods are frequently used in the medical field for determining symptoms as well as for probability prediction for selected diseases. There are numerous symptoms among the Parkinson's disease (PD) population, however changes in speech and articulation – is potentially the most significant biomarker. This article is focused on PD diagnosis classification based on their speech signals using pattern recognition methods (AdaBoost, Bagged trees, Quadratic SVM and k-NN). The dataset investigated in the article consists of 30 PD and 30 HC individuals' voice measurements, with each individual being represented with 2 recordings within the dataset. Training signals for PD and HC underwent an extraction of relatively well-discriminating features relating to energy and spectral speech properties. Model implementations included a 5-fold cross validation. The accuracy of the values obtained employing the models was calculated using the confusion matrix. The average value of the overall accuracy = 82.3 % and averaged AUC = 0.88 (min. AUC = 0.86) on the available data.
S postupující digitalizací patologie se do popředí zájmu dostávají i aplikace metod strojového učení a umělé inteligence. Výzkum a vývoj v této oblasti je velmi rychlý, ale aplikace učících systémů v klinické praxi stále zaostávají. Cílem tohoto textu je přiblížit proces tvorby a nasazení učících systémů v digitální patologii. Začneme popisem základních vlastností dat produkovaných v rámci digitální patologie. Konkrétně pojednáme o skenerech a skenování vzorků, o ukládání a přenosu dat, o kontrole jejich kvality a přípravě pro zpracování pomocí učících systémů, zejména o anotacích. Naším cílem je prezentovat aktuální přístupy k řešení technických problémů a zároveň upozornit na úskalí, na která lze narazit při zpracování dat z digitální patologie. V první části také naznačíme, jak vypadají aktuální softwarová řešení pro prohlížení naskenovaných vzorků a implementace diagnostických postupů zahrnujících učící systémy. Ve druhé části textu popíšeme obvyklé úlohy digitální patologie a naznačíme obvyklé přístupy k jejich řešení. V této části zejména vysvětlíme, jak je nutné modifikovat standardní metody strojového učení pro zpracování velkých skenů a pojednáme o konkrétních aplikacích v diagnostice. Na závěr textu poskytneme rychlý náhled dalšího možného vývoje učících systémů v digitální patologii. Zejména ilustrujeme podstatu přechodu na velké základní modely a naznačíme problematiku virtuálního barvení vzorků. Doufáme, že tento text přispěje k lepší orientaci v rapidně se vyvíjející oblasti strojového učení v digitální patologii a tím přispěje k rychlejší adopci učících metod v této oblasti.
With the advancing digitalization of pathology, the application of machine learning and artificial intelligence methods is becoming increasingly important. Research and development in this field are progressing rapidly, but the clinical implementation of learning systems still lags behind. The aim of this text is to provide an overview of the process of developing and deploying learning systems in digital pathology. We begin by describing the fundamental characteristics of data produced in digital pathology. Specifically, we discuss scanners and sample scanning, data storage and transmission, quality control, and preparation for processing by learning systems, with a particular focus on annotations. Our goal is to present current approaches to addressing technical challenges while also highlighting potential pitfalls in processing digital pathology data. In the first part of the text, we also outline existing software solutions for viewing scanned samples and implementing diagnostic procedures that incorporate learning systems. In the second part of the text, we describe common tasks in digital pathology and outline typical approaches to solving them. Here, we explain the necessary modifications to standard machine learning methods for processing large scans and discuss specific diagnostic applications. Finally, we provide a brief overview of the potential future development of learning systems in digital pathology. We illustrate the transition to large foundational models and introduce the topic of virtual staining of samples. We hope that this text will contribute to a better understanding of the rapidly evolving field of machine learning in digital pathology and, in turn, facilitate the faster adoption of learning-based methods in this domain.
TransCelerate reports on the results of 2019, 2020, and 2021 member company (MC) surveys on the use of intelligent automation in pharmacovigilance processes. MCs increased the number and extent of implementation of intelligent automation solutions throughout Individual Case Safety Report (ICSR) processing, especially with rule-based automations such as robotic process automation, lookups, and workflows, moving from planning to piloting to implementation over the 3 survey years. Companies remain highly interested in other technologies such as machine learning (ML) and artificial intelligence, which can deliver a human-like interpretation of data and decision making rather than just automating tasks. Intelligent automation solutions are usually used in combination with more than one technology being used simultaneously for the same ICSR process step. Challenges to implementing intelligent automation solutions include finding/having appropriate training data for ML models and the need for harmonized regulatory guidance.
- MeSH
- Automation MeSH
- Pharmacovigilance * MeSH
- Humans MeSH
- Machine Learning MeSH
- Technology MeSH
- Artificial Intelligence * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Breast cancer survival prediction can have an extreme effect on selection of best treatment protocols. Many approaches such as statistical or machine learning models have been employed to predict the survival prospects of patients, but newer algorithms such as deep learning can be tested with the aim of improving the models and prediction accuracy. In this study, we used machine learning and deep learning approaches to predict breast cancer survival in 4,902 patient records from the University of Malaya Medical Centre Breast Cancer Registry. The results indicated that the multilayer perceptron (MLP), random forest (RF) and decision tree (DT) classifiers could predict survivorship, respectively, with 88.2 %, 83.3 % and 82.5 % accuracy in the tested samples. Support vector machine (SVM) came out to be lower with 80.5 %. In this study, tumour size turned out to be the most important feature for breast cancer survivability prediction. Both deep learning and machine learning methods produce desirable prediction accuracy, but other factors such as parameter configurations and data transformations affect the accuracy of the predictive model.
- MeSH
- Survival Analysis MeSH
- Deep Learning * MeSH
- Demography MeSH
- Adult MeSH
- Calibration MeSH
- Middle Aged MeSH
- Humans MeSH
- Young Adult MeSH
- Breast Neoplasms mortality MeSH
- Neural Networks, Computer MeSH
- Decision Trees MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Support Vector Machine MeSH
- Check Tag
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Young Adult MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially "badly behaving compounds", "bad actors", or "nuisance compounds". These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on "privileged scaffolds" to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose-response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de , not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.
- MeSH
- Databases, Pharmaceutical MeSH
- Small Molecule Libraries chemistry MeSH
- Pharmaceutical Preparations chemistry MeSH
- Models, Molecular MeSH
- Proteins chemistry MeSH
- ROC Curve MeSH
- High-Throughput Screening Assays methods MeSH
- Machine Learning * MeSH
- Protein Binding MeSH
- Binding Sites MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH