Decision support systems represent very complicated systems offering assistance with the decision making process. Learning the classification rule of a decision support system requires to solve complex statistical task, most commonly by means of classification analysis. However, the regression methodology may be useful in this context as well. This paper has the aim to overview various regression methods, discuss their properties and show examples within clinical decision making.
- MeSH
- Data Interpretation, Statistical MeSH
- Clinical Decision-Making methods MeSH
- Linear Models MeSH
- Logistic Models MeSH
- Least-Squares Analysis MeSH
- Neural Networks, Computer MeSH
- Regression Analysis * MeSH
- Models, Statistical * MeSH
- Statistics as Topic MeSH
- Support Vector Machine MeSH
- Decision Support Systems, Clinical MeSH
From a wide range of techniques appropriate to relate spectra measurements with soil properties, partial least squares (PLS) regression and support vector machines (SVM) are most commonly used. This is due to their predictive power and the availability of software tools. Both represent exclusively statistically based approaches and, as such, benefit from multiple responses of soil material in the spectrum. However, physical-based approaches that focus only on a single spectral feature, such as simple linear regression using selected continuum-removed spectra values as a predictor variable, often provide accurate estimates. Furthermore, if this approach extends to multiple cases by taking into account three basic absorption feature parameters (area, width, and depth) of all occurring features as predictors and subjecting them to best subset selection, one can achieve even higher prediction accuracy compared with PLS regression. Here, we attempt to further extend this approach by adding two additional absorption feature parameters (left and right side area), as they can be important diagnostic markers, too. As a result, we achieved higher prediction accuracy compared with PLS regression and SVM for exchangeable soil pH, slightly higher or comparable for dithionite-citrate and ammonium oxalate extractable Fe and Mn forms, but slightly worse for oxidizable carbon content. Therefore, we suggest incorporating the multiple linear regression approach based on absorption feature parameters into existing working practices.
Here, we report the data visualization, analysis and modeling for a large set of 4830 SN 2 reactions the rate constant of which (logk) was measured at different experimental conditions (solvent, temperature). The reactions were encoded by one single molecular graph - Condensed Graph of Reactions, which allowed us to use conventional chemoinformatics techniques developed for individual molecules. Thus, Matched Reaction Pairs approach was suggested and used for the analyses of substituents effects on the substrates and nucleophiles reactivity. The data were visualized with the help of the Generative Topographic Mapping approach. Consensus Support Vector Regression (SVR) model for the rate constant was prepared. Unbiased estimation of the model's performance was made in cross-validation on reactions measured on unique structural transformations. The model's performance in cross-validation (RMSE=0.61 logk units) and on the external test set (RMSE=0.80) is close to the noise in data. Performances of the local models obtained for selected subsets of reactions proceeding in particular solvents or with particular type of nucleophiles were similar to that of the model built on the entire set. Finally, four different definitions of model's applicability domains for reactions were examined.
In order to monitor Potentially Toxic Elements (PTEs) in anthropogenic soils on brown coal mining dumpsites, a large number of samples and cumbersome, time-consuming laboratory measurements are required. Due to its rapidity, convenience and accuracy, reflectance spectroscopy within the Visible-Near Infrared (Vis-NIR) region has been used to predict soil constituents. This study evaluated the suitability of Vis-NIR (350-2500 nm) reflectance spectroscopy for predicting PTEs concentration, using samples collected on large brown coal mining dumpsites in the Czech Republic. Partial Least Square Regression (PLSR) and Support Vector Machine Regression (SVMR) with cross-validation were used to relate PTEs data to the reflectance spectral data by applying different preprocessing strategies. According to the criteria of minimal Root Mean Square Error of Prediction of Cross Validation (RMSEPcv) and maximal coefficient of determination (R2cv) and Residual Prediction Deviation (RPD), the SVMR models with the first derivative pretreatment provided the most accurate prediction for As (R2cv) = 0.89, RMSEPcv = 1.89, RPD = 2.63). Less accurate, but acceptable prediction for screening purposes for Cd and Cu (0.66 ˂ R2cv) ˂ 0.81, RMSEPcv = 0.0.8 and 4.08 respectively, 2.0 ˂ RPD ˂ 2.5) were obtained. The PLSR model for predicting Mn (R2cv) = 0.44, RMSEPcv = 116.43, RPD = 1.45) presented an inadequate model. Overall, SVMR models for the Vis-NIR spectra could be used indirectly for an accurate assessment of PTEs' concentrations.
Dental development is frequently used to estimate age in many anthropological specializations. The aim of this study was to extract an accurate predictive age system for the Czech population and to discover any different predictive ability of various tooth types and their ontogenetic stability during infancy and adolescence. A cross-sectional panoramic X-ray study was based on developmental stages assessment of mandibular teeth (Moorrees et al. 1963) using 1393 individuals aged from 3 to 17 years. Data mining methods were used for dental age estimation. These are based on nonlinear relationships between the predicted age and data sets. Compared with other tested predictive models, the GAME method predicted age with the highest accuracy. Age-interval estimations between the 10th and 90th percentiles ranged from -1.06 to +1.01 years in girls and from -1.13 to +1.20 in boys. Accuracy was expressed by RMS error, which is the average deviation between estimated and chronological age. The predictive value of individual teeth changed during the investigated period from 3 to 17 years. When we evaluated the whole period, the second molars exhibited the best predictive ability. When evaluating partial age periods, we found that the accuracy of biological age prediction declines with increasing age (from 0.52 to 1.20 years in girls and from 0.62 to 1.22 years in boys) and that the predictive importance of tooth types changes, depending on variability and the number of developmental stages in the age interval. GAME is a promising tool for age-interval estimation studies as they can provide reliable predictive models.
- MeSH
- Data Mining methods MeSH
- Child MeSH
- Humans MeSH
- Adolescent MeSH
- Child, Preschool MeSH
- Cross-Sectional Studies MeSH
- Regression Analysis MeSH
- Radiography, Panoramic MeSH
- Models, Statistical MeSH
- Support Vector Machine MeSH
- Age Determination by Teeth methods MeSH
- Tooth anatomy & histology radiography MeSH
- Check Tag
- Child MeSH
- Humans MeSH
- Adolescent MeSH
- Male MeSH
- Child, Preschool MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
In order to analyze and improve the dental age estimation in children and adolescents for forensic purposes, 22 age estimation methods were compared to a sample of 976 orthopantomographs (662 males, 314 females) of healthy Czech children and adolescents aged between 2.7 and 20.5 years. All methods are compared in terms of the accuracy and complexity and are based on various data mining methods or on simple mathematical operations. The winning method is presented in detail. The comparison showed that only three methods provide the best accuracy while remaining user-friendly. These methods were used to build a tabular multiple linear regression model, an M5P tree model and support vector machine model with first-order polynomial kernel. All of them have mean absolute error (MAE) under 0.7 years for both males and females. The other well-performing data mining methods (RBF neural network, K-nearest neighbors, Kstar, etc.) have similar or slightly better accuracy, but they are not user-friendly as they require computing equipment and the implementation as computer program. The lowest estimation accuracy provides the traditional model based on age averages (MAE under 0.96 years). Different relevancy of various teeth for the age estimation was found. This finding also explains the lowest accuracy of the traditional averages-based model. In this paper, a technique for missing data replacement for the cases with missing teeth is presented in detail as well as the constrained tabular multiple regression model. Also, we provide free age prediction software based on this wining model.
- MeSH
- Data Mining MeSH
- Dentition, Permanent * MeSH
- Child MeSH
- Humans MeSH
- Linear Models MeSH
- Adolescent MeSH
- Young Adult MeSH
- Neural Networks, Computer MeSH
- Child, Preschool MeSH
- Radiography, Panoramic MeSH
- Decision Trees MeSH
- Software MeSH
- Support Vector Machine MeSH
- Age Determination by Teeth methods MeSH
- Tooth growth & development MeSH
- Check Tag
- Child MeSH
- Humans MeSH
- Adolescent MeSH
- Young Adult MeSH
- Male MeSH
- Child, Preschool MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Comparative Study MeSH
Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
- MeSH
- Speech Acoustics * MeSH
- Acoustics MeSH
- Hoarseness diagnosis physiopathology MeSH
- Chronic Disease MeSH
- Reading MeSH
- Adult MeSH
- Voice Quality * MeSH
- Middle Aged MeSH
- Humans MeSH
- Speech Production Measurement methods MeSH
- Adolescent MeSH
- Young Adult MeSH
- Signal Processing, Computer-Assisted * MeSH
- Predictive Value of Tests MeSH
- Regression Analysis MeSH
- Reproducibility of Results MeSH
- Pattern Recognition, Automated * MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Support Vector Machine MeSH
- Check Tag
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Adolescent MeSH
- Young Adult MeSH
- Male MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
BACKGROUND: Abiotic conditions provide cues that drive tick questing activity. Defining these cues is critical in predicting biting risk, and in forecasting climate change impacts on tick populations. This is particularly important for Ixodes ricinus nymphs, the vector of numerous pathogens affecting humans. METHODS: A 6-year study of the questing activity of I. ricinus was conducted in Central Bohemia, Czech Republic, from 2001 to 2006. Tick numbers were determined by weekly flagging the vegetation in a defined 600 m(2) field site. After capture, ticks were released back to where they were found. Concurrent temperature data and relative humidity were collected in the microhabitat and at a nearby meteorological station. Data were analysed by regression methods. RESULTS: During 208 monitoring visits, a total of 21,623 ticks were recorded. Larvae, nymphs, and adults showed typical bimodal questing activity curves with major spring peaks and minor late summer or autumn peaks (mid-summer for males). Questing activity of nymphs and adults began with ~12 h of daylight and ceased at ~9 h daylight, at limiting temperatures close to freezing (in early spring and late autumn); questing occurred during ~70 % calendar year without cessation in summer. The co-occurrence of larvae and nymphs varied annually, ranging from 31 to 80 % of monitoring visits, and depended on the questing activity of larvae. Near-ground temperature, day length, and relative air humidity were all significant predictors of nymphal activity. For 70 % of records, near-ground temperatures measured in the microhabitat were 4-5 °C lower than those recorded by the nearby meteorological observatory, although they were strongly dependent. Inter-annual differences in seasonal numbers of nymphs reflected extreme weather events. CONCLUSIONS: Weather predictions (particularly for temperature) combined with daylight length, are good predictors of the initiation and cessation of I. ricinus nymph questing activity, and hence of the risk period to humans, in Central Europe. Co-occurrence data for larvae and nymphs support the notion of intrastadial rather than interstadial co-feeding pathogen transmission. Annual questing tick numbers recover quickly from the impact of extreme weather events.
- MeSH
- Time Factors MeSH
- Ixodes physiology MeSH
- Larva physiology MeSH
- Nymph physiology MeSH
- Population Dynamics MeSH
- Seasons MeSH
- Temperature MeSH
- Humidity MeSH
- Environment * MeSH
- Animals MeSH
- Check Tag
- Male MeSH
- Female MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Ranked model in the form of linear transformation of multivariate feature vectors on a line can reflect a causal order between liver diseases. A priori medical knowledge about order between liver diseases and clinical data sets has been used in the definition of the convex and piecewise linear (CPL) criterion function. The linear ranked transformations have been designed here through minimization of such CPL criterion functions.
- Keywords
- funkce hazardu, FLIPI, Follicular Lymphoma International Prognostic Index, overfitting,
- MeSH
- Algorithms MeSH
- Survival Analysis MeSH
- Bayes Theorem MeSH
- Confounding Factors, Epidemiologic MeSH
- Lymphoma, Follicular * MeSH
- Humans MeSH
- Logistic Models MeSH
- Decision Support Techniques MeSH
- Neural Networks, Computer MeSH
- Odds Ratio MeSH
- Probability MeSH
- Prognosis * MeSH
- Models, Statistical MeSH
- Statistics as Topic MeSH
- Support Vector Machine MeSH
- Check Tag
- Humans MeSH