Support vector regression
Dotaz
Zobrazit nápovědu
Accuracy of identification tools in forensic anthropology primarily rely upon the variations inherent in the data upon which they are built. Sex determination methods based on craniometrics are widely used and known to be specific to several factors (e.g. sample distribution, population, age, secular trends, measurement technique, etc.). The goal of this study is to discuss the potential variations linked to the statistical treatment of the data. Traditional craniometrics of four samples extracted from documented osteological collections (from Portugal, France, the U.S.A., and Thailand) were used to test three different classification methods: linear discriminant analysis (LDA), logistic regression (LR), and support vector machines (SVM). The Portuguese sample was set as a training model on which the other samples were applied in order to assess the validity and reliability of the different models. The tests were performed using different parameters: some included the selection of the best predictors; some included a strict decision threshold (sex assessed only if the related posterior probability was high, including the notion of indeterminate result); and some used an unbalanced sex-ratio. Results indicated that LR tends to perform slightly better than the other techniques and offers a better selection of predictors. Also, the use of a decision threshold (i.e. p>0.95) is essential to ensure an acceptable reliability of sex determination methods based on craniometrics. Although the Portuguese, French, and American samples share a similar sexual dimorphism, application of Western models on the Thai sample (that displayed a lower degree of dimorphism) was unsuccessful.
- Klíčová slova
- Accuracy, Forensic anthropology population data, Population, Reliability, Sex estimation, Statistics,
- MeSH
- diskriminační analýza MeSH
- kefalometrie * MeSH
- lidé MeSH
- logistické modely MeSH
- rasové skupiny MeSH
- reprodukovatelnost výsledků MeSH
- soudní antropologie MeSH
- support vector machine MeSH
- určení pohlaví podle kostry metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.
- Klíčová slova
- image colour properties, image processing, image texture properties, machine vision system, supervised classification,
- MeSH
- dieta MeSH
- logistické modely MeSH
- Oncorhynchus mykiss MeSH
- support vector machine * MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH
Here, we report the data visualization, analysis and modeling for a large set of 4830 SN 2 reactions the rate constant of which (logk) was measured at different experimental conditions (solvent, temperature). The reactions were encoded by one single molecular graph - Condensed Graph of Reactions, which allowed us to use conventional chemoinformatics techniques developed for individual molecules. Thus, Matched Reaction Pairs approach was suggested and used for the analyses of substituents effects on the substrates and nucleophiles reactivity. The data were visualized with the help of the Generative Topographic Mapping approach. Consensus Support Vector Regression (SVR) model for the rate constant was prepared. Unbiased estimation of the model's performance was made in cross-validation on reactions measured on unique structural transformations. The model's performance in cross-validation (RMSE=0.61 logk units) and on the external test set (RMSE=0.80) is close to the noise in data. Performances of the local models obtained for selected subsets of reactions proceeding in particular solvents or with particular type of nucleophiles were similar to that of the model built on the entire set. Finally, four different definitions of model's applicability domains for reactions were examined.
- Klíčová slova
- Condensed Graph of Reaction, Generative Topographic Mapping, Matched Reaction Pairs, Support Vector Regression, bimolecular nucleophilic substitution reactions, models applicability domain,
- MeSH
- chemické modely * MeSH
- cyklické uhlovodíky chemie MeSH
- kinetika MeSH
- oxidace-redukce MeSH
- support vector machine * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- cyklické uhlovodíky MeSH
Soil pollution is a big issue caused by anthropogenic activities. The spatial distribution of potentially toxic elements (PTEs) varies in most urban and peri-urban areas. As a result, spatially predicting the PTEs content in such soil is difficult. A total number of 115 samples were obtained from Frydek Mistek in the Czech Republic. Calcium (Ca), magnesium (Mg), potassium (K), and nickel (Ni) concentrations were determined using Inductively Coupled Plasma Optical Emission Spectroscopy. The response variable was Ni, while the predictors were Ca, Mg, and K. The correlation matrix between the response variable and the predictors revealed a satisfactory correlation between the elements. The prediction results indicated that support vector machine regression (SVMR) performed well, although its estimated root mean square error (RMSE) (235.974 mg/kg) and mean absolute error (MAE) (166.946 mg/kg) were higher when compared with the other methods applied. The hybridized model of empirical bayesian kriging-multiple linear regression (EBK-MLR) performed poorly, as evidenced by a coefficient of determination value of less than 0.1. The empirical bayesian kriging-support vector machine regression (EBK-SVMR) model was the optimal model, with low RMSE (95.479 mg/kg) and MAE (77.368 mg/kg) values and a high coefficient of determination (R2 = 0.637). EBK-SVMR modelling technique output was visualized using a self-organizing map. The clustered neurons of the hybridized model CakMg-EBK-SVMR component plane showed a diverse colour pattern predicting the concentration of Ni in the urban and peri-urban soil. The results proved that combining EBK and SVMR is an effective technique for predicting Ni concentrations in urban and peri-urban soil.
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
From a wide range of techniques appropriate to relate spectra measurements with soil properties, partial least squares (PLS) regression and support vector machines (SVM) are most commonly used. This is due to their predictive power and the availability of software tools. Both represent exclusively statistically based approaches and, as such, benefit from multiple responses of soil material in the spectrum. However, physical-based approaches that focus only on a single spectral feature, such as simple linear regression using selected continuum-removed spectra values as a predictor variable, often provide accurate estimates. Furthermore, if this approach extends to multiple cases by taking into account three basic absorption feature parameters (area, width, and depth) of all occurring features as predictors and subjecting them to best subset selection, one can achieve even higher prediction accuracy compared with PLS regression. Here, we attempt to further extend this approach by adding two additional absorption feature parameters (left and right side area), as they can be important diagnostic markers, too. As a result, we achieved higher prediction accuracy compared with PLS regression and SVM for exchangeable soil pH, slightly higher or comparable for dithionite-citrate and ammonium oxalate extractable Fe and Mn forms, but slightly worse for oxidizable carbon content. Therefore, we suggest incorporating the multiple linear regression approach based on absorption feature parameters into existing working practices.
In order to monitor Potentially Toxic Elements (PTEs) in anthropogenic soils on brown coal mining dumpsites, a large number of samples and cumbersome, time-consuming laboratory measurements are required. Due to its rapidity, convenience and accuracy, reflectance spectroscopy within the Visible-Near Infrared (Vis-NIR) region has been used to predict soil constituents. This study evaluated the suitability of Vis-NIR (350-2500 nm) reflectance spectroscopy for predicting PTEs concentration, using samples collected on large brown coal mining dumpsites in the Czech Republic. Partial Least Square Regression (PLSR) and Support Vector Machine Regression (SVMR) with cross-validation were used to relate PTEs data to the reflectance spectral data by applying different preprocessing strategies. According to the criteria of minimal Root Mean Square Error of Prediction of Cross Validation (RMSEPcv) and maximal coefficient of determination (R2cv) and Residual Prediction Deviation (RPD), the SVMR models with the first derivative pretreatment provided the most accurate prediction for As (R2cv) = 0.89, RMSEPcv = 1.89, RPD = 2.63). Less accurate, but acceptable prediction for screening purposes for Cd and Cu (0.66 ˂ R2cv) ˂ 0.81, RMSEPcv = 0.0.8 and 4.08 respectively, 2.0 ˂ RPD ˂ 2.5) were obtained. The PLSR model for predicting Mn (R2cv) = 0.44, RMSEPcv = 116.43, RPD = 1.45) presented an inadequate model. Overall, SVMR models for the Vis-NIR spectra could be used indirectly for an accurate assessment of PTEs' concentrations.
- MeSH
- monitorování životního prostředí MeSH
- půda chemie MeSH
- support vector machine MeSH
- těžba uhlí * MeSH
- znečištění životního prostředí analýza MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- půda MeSH
In situ visible and near-infrared (Vis-NIR) spectroscopy has proven to be a reliable tool for determining soil organic carbon (SOC) content with a small loss of precision as compared to laboratory measurements. The loss of precision is a result of disturbing external environmental factors that disrupt spectral measurements. For example, roughness, changes in weather conditions, humidity, temperature, human factors, spectral noise and especially soil water. It has been assumed that, in situ predictive capability could be improved if some of these factors are either minimized or eliminated during the in situ measurement. For this study, the prediction of SOC was carried out under two different in situ measurement conditions; less favourable environmental conditions (with disturbances) and more favourable site-specific conditions (disturbance-reduced conditions). The primary goal is to determine whether the estimate of SOC can be improved under more favourable site-specific conditions, as well as the impact of pre-treatment algorithms on both less and more favourable disturbed conditions. The study employed a large range of pretreatment algorithms and their combinations. Three separate multivariate models were used to predict SOC, namely Cubist, support vector machine regression (SVMR), and partial least squares regression (PLSR). The result clearly shows that reduced disturbing factors (i.e., drier and unploughed soil as well as noise reduction) result in an improvement of SOC prediction with in situ Vis-NIR spectroscopy. The best overall result was achieved with SVMR (R2CV = 0.72, RMSEPcv = 0.21, RPIQ = 2.34). Although the combination of pre-treatment algorithms resulted in an improvement, overall, these pre-treatment algorithms could not compensate for the factors affecting the measured spectra with disturbance. Though the obtained result is promising, further study is still needed to disentangle the impacts and interactions of various disturbing factors for different soil types.
- Klíčová slova
- Agricultural soil, In situ spectroscopy, Machine learning algorithms, Pre-treatment algorithms, SOC,
- MeSH
- blízká infračervená spektroskopie metody MeSH
- lidé MeSH
- metoda nejmenších čtverců MeSH
- půda * chemie MeSH
- support vector machine MeSH
- uhlík * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- půda * MeSH
- uhlík * MeSH
Dental development is frequently used to estimate age in many anthropological specializations. The aim of this study was to extract an accurate predictive age system for the Czech population and to discover any different predictive ability of various tooth types and their ontogenetic stability during infancy and adolescence. A cross-sectional panoramic X-ray study was based on developmental stages assessment of mandibular teeth (Moorrees et al. 1963) using 1393 individuals aged from 3 to 17 years. Data mining methods were used for dental age estimation. These are based on nonlinear relationships between the predicted age and data sets. Compared with other tested predictive models, the GAME method predicted age with the highest accuracy. Age-interval estimations between the 10th and 90th percentiles ranged from -1.06 to +1.01 years in girls and from -1.13 to +1.20 in boys. Accuracy was expressed by RMS error, which is the average deviation between estimated and chronological age. The predictive value of individual teeth changed during the investigated period from 3 to 17 years. When we evaluated the whole period, the second molars exhibited the best predictive ability. When evaluating partial age periods, we found that the accuracy of biological age prediction declines with increasing age (from 0.52 to 1.20 years in girls and from 0.62 to 1.22 years in boys) and that the predictive importance of tooth types changes, depending on variability and the number of developmental stages in the age interval. GAME is a promising tool for age-interval estimation studies as they can provide reliable predictive models.
- MeSH
- data mining metody MeSH
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- předškolní dítě MeSH
- průřezové studie MeSH
- regresní analýza MeSH
- rentgendiagnostika panoramatická MeSH
- statistické modely MeSH
- support vector machine MeSH
- určení zubního věku metody MeSH
- zuby anatomie a histologie diagnostické zobrazování MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mužské pohlaví MeSH
- předškolní dítě MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Cyanobacteria blooms in fishponds, driven by climate change and anthropogenic activities, have become a critical concern for aquatic ecosystems worldwide. The diversity in fishpond sizes and fish densities further complicates their monitoring. This study addresses the challenge of accurately predicting cyanobacteria concentrations in turbid waters via remote sensing, hindered by optical complexities and diminished light signals. A comprehensive dataset of 740 sampling points was compiled, encompassing water quality metrics (cyanobacteria levels, total chlorophyll, turbidity, total cell count) and spectral data obtained through AlgaeTorch, alongside Sentinel-2 reflectance data from three Třeboň fishponds (UNESCO Man and Biosphere Reserve) in the Czech Republic over 2022-2023. Partial Least Squares Regression (PLSR) and three machine learning algorithms, Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost), were developed based on seasonal and annual data volumes. The SVM algorithm demonstrated commendable performance on the one-year data validation dataset from the Svět fishpond for the prediction of cyanobacteria, reflected by the key performance indicators: R2 = 0.88, RMSE = 15.07 μg Chl-a/L, and RPD = 2.82. Meanwhile, SVM displayed steady results in the unified one-year validation dataset from Naděje, Svět, and Vizír fishponds, with metrics showing R2 = 0.56, RMSE = 39.03 μg Chl-a/L, RPD = 1.50. Thus, Sentinel data proved viable for seasonal cyanobacteria monitoring across different fishponds. Overall, this study presents a novel approach for enhancing the precision of cyanobacteria predictions and long-term ecological monitoring in fishponds, contributing significantly to the water quality management strategies in the Třeboň region.
- Klíčová slova
- Cyanobacteria, Fishponds, Machine learning, Remote sensing, Water quality inversion,
- MeSH
- eutrofizace MeSH
- kvalita vody MeSH
- monitorování životního prostředí * metody MeSH
- sinice * MeSH
- strojové učení * MeSH
- support vector machine MeSH
- technologie dálkového snímání * MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Česká republika MeSH
In order to analyze and improve the dental age estimation in children and adolescents for forensic purposes, 22 age estimation methods were compared to a sample of 976 orthopantomographs (662 males, 314 females) of healthy Czech children and adolescents aged between 2.7 and 20.5 years. All methods are compared in terms of the accuracy and complexity and are based on various data mining methods or on simple mathematical operations. The winning method is presented in detail. The comparison showed that only three methods provide the best accuracy while remaining user-friendly. These methods were used to build a tabular multiple linear regression model, an M5P tree model and support vector machine model with first-order polynomial kernel. All of them have mean absolute error (MAE) under 0.7 years for both males and females. The other well-performing data mining methods (RBF neural network, K-nearest neighbors, Kstar, etc.) have similar or slightly better accuracy, but they are not user-friendly as they require computing equipment and the implementation as computer program. The lowest estimation accuracy provides the traditional model based on age averages (MAE under 0.96 years). Different relevancy of various teeth for the age estimation was found. This finding also explains the lowest accuracy of the traditional averages-based model. In this paper, a technique for missing data replacement for the cases with missing teeth is presented in detail as well as the constrained tabular multiple regression model. Also, we provide free age prediction software based on this wining model.
- Klíčová slova
- Age estimation, Data mining, Model, Population-specific standards,
- MeSH
- data mining MeSH
- dentice trvalá * MeSH
- dítě MeSH
- lidé MeSH
- lineární modely MeSH
- mladiství MeSH
- mladý dospělý MeSH
- neuronové sítě MeSH
- předškolní dítě MeSH
- rentgendiagnostika panoramatická MeSH
- rozhodovací stromy MeSH
- software MeSH
- support vector machine MeSH
- určení zubního věku metody MeSH
- zuby růst a vývoj MeSH
- Check Tag
- dítě MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- předškolní dítě MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH