Data mining methods
Dotaz
Zobrazit nápovědu
- Klíčová slova
- statistika, vícerozměrná analýza, velké datové soubory,
- MeSH
- databáze genetické trendy využití MeSH
- distanční studium metody trendy MeSH
- financování organizované MeSH
- genetické techniky trendy využití MeSH
- lékařská informatika MeSH
- lidé MeSH
- počítačem řízená výuka přístrojové vybavení využití MeSH
- sběr dat metody trendy MeSH
- statistika jako téma MeSH
- teoretické modely MeSH
- zobrazování dat trendy MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- databáze MeSH
1 online zdroj
- MeSH
- data mining MeSH
- sběr dat metody MeSH
- ukládání a vyhledávání informací * MeSH
- Publikační typ
- dataset MeSH
- periodika MeSH
- Konspekt
- Věda. Všeobecnosti. Základy vědy a kultury. Vědecká práce
- NLK Obory
- věda a výzkum
As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.
Data mining (DM) is a widely adopted methodology for the analysis of large datasets which is on the other hand often overestimated or incorrectly considered as a universal solution. This statement is also valid for clinical research, in which large and heterogeneous datasets are often processed. DM in general uses standard methods available in common statistical software and combines them into a complex workflow methodology covering all the steps of data analysis from data acquisition through pre-processing and data analysis to interpretation of the results. The whole workflow is aimed at one final goal – to find any interesting, non-trivially hidden and potentially useful information. This innovative concept of data mining was adopted in our educational course of the Faculty of Medicine at the Masaryk University accessible from its e-learning portal http://portal. med.muni.cz/clanek-318-zavedeni-technologie-data-miningu-a-analyzy-dat--genovych-expresnich-map-do-vyuky.html.
- MeSH
- biostatistika metody MeSH
- data mining * metody trendy MeSH
- lidé MeSH
- multifaktorová rozměrová redukce metody MeSH
- počítačem řízená výuka * metody trendy MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- práce podpořená grantem MeSH
Dental development is frequently used to estimate age in many anthropological specializations. The aim of this study was to extract an accurate predictive age system for the Czech population and to discover any different predictive ability of various tooth types and their ontogenetic stability during infancy and adolescence. A cross-sectional panoramic X-ray study was based on developmental stages assessment of mandibular teeth (Moorrees et al. 1963) using 1393 individuals aged from 3 to 17 years. Data mining methods were used for dental age estimation. These are based on nonlinear relationships between the predicted age and data sets. Compared with other tested predictive models, the GAME method predicted age with the highest accuracy. Age-interval estimations between the 10th and 90th percentiles ranged from -1.06 to +1.01 years in girls and from -1.13 to +1.20 in boys. Accuracy was expressed by RMS error, which is the average deviation between estimated and chronological age. The predictive value of individual teeth changed during the investigated period from 3 to 17 years. When we evaluated the whole period, the second molars exhibited the best predictive ability. When evaluating partial age periods, we found that the accuracy of biological age prediction declines with increasing age (from 0.52 to 1.20 years in girls and from 0.62 to 1.22 years in boys) and that the predictive importance of tooth types changes, depending on variability and the number of developmental stages in the age interval. GAME is a promising tool for age-interval estimation studies as they can provide reliable predictive models.
- MeSH
- data mining metody MeSH
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- předškolní dítě MeSH
- průřezové studie MeSH
- regresní analýza MeSH
- rentgendiagnostika panoramatická MeSH
- statistické modely MeSH
- support vector machine MeSH
- určení zubního věku metody MeSH
- zuby anatomie a histologie radiografie MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mužské pohlaví MeSH
- předškolní dítě MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Age-at-death estimation of adult skeletal remains is a key part of biological profile estimation, yet it remains problematic for several reasons. One of them may be the subjective nature of the evaluation of age-related changes, or the fact that the human eye is unable to detect all the relevant surface changes. We have several aims: (1) to validate already existing computer models for age estimation; (2) to propose our own expert system based on computational approaches to eliminate the factor of subjectivity and to use the full potential of surface changes on an articulation area; and (3) to determine what age range the pubic symphysis is useful for age estimation. A sample of 483 3D representations of the pubic symphyseal surfaces from the ossa coxae of adult individuals coming from four European (two from Portugal, one from Switzerland and Greece) and one Asian (Thailand) identified skeletal collections was used. A validation of published algorithms showed very high error in our dataset-the Mean Absolute Error (MAE) ranged from 16.2 and 25.1 years. Two completely new approaches were proposed in this paper: SASS (Simple Automated Symphyseal Surface-based) and AANNESS (Advanced Automated Neural Network-grounded Extended Symphyseal Surface-based), whose MAE values are 11.7 and 10.6 years, respectively. Lastly, it was demonstrated that our models could estimate the age-at-death using the pubic symphysis over the entire adult age range. The proposed models offer objective age estimates with low estimation error (compared to traditional visual methods) and are able to estimate age using the pubic symphysis across the entire adult age range.
- MeSH
- data mining MeSH
- dospělí MeSH
- lidé MeSH
- soudní antropologie metody MeSH
- symphysis pubica * MeSH
- určení kostního věku metody MeSH
- zobrazování trojrozměrné MeSH
- Check Tag
- dospělí MeSH
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
Recently published studies showed that age assessment methods are population specific. Authors analyse the senescence changes in pubic symphysis and sacro-pelvic surface of a pelvic bone using data mining methods. The multi-ethnic data set consists of 956 adult individuals ranging from 19 to 100 years of age derived from 9 different populations with known age and sex. The results show that accurate and reliable age assessment is possible to three age classes (less than 30, 30-60, 60 and more). The study confirms that population specificity of the methods exists and the variable "sex" is not important in age classification.
- MeSH
- algoritmy MeSH
- data mining metody MeSH
- dospělí MeSH
- etnicita MeSH
- lidé středního věku MeSH
- lidé MeSH
- os ilium anatomie a histologie MeSH
- rasové skupiny MeSH
- ROC křivka MeSH
- senioři nad 80 let MeSH
- senioři MeSH
- soudní antropologie MeSH
- symphysis pubica anatomie a histologie MeSH
- určení kostního věku metody MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- senioři nad 80 let MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
In order to analyze and improve the dental age estimation in children and adolescents for forensic purposes, 22 age estimation methods were compared to a sample of 976 orthopantomographs (662 males, 314 females) of healthy Czech children and adolescents aged between 2.7 and 20.5 years. All methods are compared in terms of the accuracy and complexity and are based on various data mining methods or on simple mathematical operations. The winning method is presented in detail. The comparison showed that only three methods provide the best accuracy while remaining user-friendly. These methods were used to build a tabular multiple linear regression model, an M5P tree model and support vector machine model with first-order polynomial kernel. All of them have mean absolute error (MAE) under 0.7 years for both males and females. The other well-performing data mining methods (RBF neural network, K-nearest neighbors, Kstar, etc.) have similar or slightly better accuracy, but they are not user-friendly as they require computing equipment and the implementation as computer program. The lowest estimation accuracy provides the traditional model based on age averages (MAE under 0.96 years). Different relevancy of various teeth for the age estimation was found. This finding also explains the lowest accuracy of the traditional averages-based model. In this paper, a technique for missing data replacement for the cases with missing teeth is presented in detail as well as the constrained tabular multiple regression model. Also, we provide free age prediction software based on this wining model.
- MeSH
- data mining MeSH
- dentice trvalá * MeSH
- dítě MeSH
- lidé MeSH
- lineární modely MeSH
- mladiství MeSH
- mladý dospělý MeSH
- neuronové sítě MeSH
- předškolní dítě MeSH
- rentgendiagnostika panoramatická MeSH
- rozhodovací stromy MeSH
- software MeSH
- support vector machine MeSH
- určení zubního věku metody MeSH
- zuby růst a vývoj MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- předškolní dítě MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH
Background: Many previous studies on mining prescription sequences are based only on frequency information, such as the number of prescriptions and the total number of patients issued the prescription. However, in cases where a very small number of doctors issue a prescription representative of a certain medication pattern to many patients many times, the prescribing intention of this very small number of doctors has a great influence on pattern extraction, which introduces bias into the final extracted frequent prescription sequence pattern. Objectives: We attempt to extract frequent prescription sequences from more diverse perspectives by considering factors other than frequency information to ensure highly reliable medication patterns. Methods: We propose the concept of unbiased frequent use by doctors as a factor in addition to frequency information based on the hypothesis that a prescription used by many doctors unbiasedly is a highly reliable prescription. We propose a medication pattern mining method that considers unbiased frequent use by doctors. We conducted an evaluation experiment using indicators based on clinical laboratory test results as a comparative evaluation of the existing method, which relied only on frequency, and included consideration of unbiased frequent use by doctors by the proposed method. Results: The weighted average value of the top k for two different evaluation methods is obtained. Conclusions: The study suggested that our medication pattern mining method considering unbiased frequent use by doctors is useful in certain situations such as when the clinical laboratory test value is outside of the normal value range.
Objectives: The goals of this study were to examine the feasibility of using ontology-based text mining with CaringBridge social media journal entries in order to understand journal content from a whole-person perspective. Specific aims were to describe Omaha System problem concept frequencies in the journal entries over a four-step process overall, and relative to Omaha System Domains; and to examine the four step method including the use of standardized terms and related words. Design: Ontology-based retrospective observational feasibility study using text mining methods. Sample: A corpus of social media text consisting of 13,757,900 CaringBridge journal entries from June 2006 to June 2016. Measures: The Omaha System terms, including problems and signs/symptoms, were used as the foundational lexicon for this study. Development of an extended lexicon with related words for each problem concept expanded the semantics-powered data analytics approach to reflect consumer word choices. Results: All Omaha System problem concepts were identified in the journal entries, with consistent representation across domains. The approach was most successful when common words were used to represent clinical terms. Preliminary validation of journal examples showed appropriate representation of the problem concepts. Conclusions: This is the first study to evaluate the feasibility of using an interface terminology and ontology (the Omaha System) as a text mining information model. Further research is needed to systematically validate these findings, refine the process as needed to advance the study of CaringBridge content, and extend the use of this method to other consumer-generated journal entries and terminologies.
- Klíčová slova
- Omaha System,
- MeSH
- bio-ontologie MeSH
- data mining * metody MeSH
- lidé MeSH
- řízený slovník MeSH
- Check Tag
- lidé MeSH