Random Forest
Dotaz
Zobrazit nápovědu
Random Forest is an ensemble of decision trees based on the bagging and random subspace concepts. As suggested by Breiman, the strength of unstable learners and the diversity among them are the ensemble models' core strength. In this paper, we propose two approaches known as oblique and rotation double random forests. In the first approach, we propose rotation based double random forest. In rotation based double random forests, transformation or rotation of the feature space is generated at each node. At each node different random feature subspace is chosen for evaluation, hence the transformation at each node is different. Different transformations result in better diversity among the base learners and hence, better generalization performance. With the double random forest as base learner, the data at each node is transformed via two different transformations namely, principal component analysis and linear discriminant analysis. In the second approach, we propose oblique double random forest. Decision trees in random forest and double random forest are univariate, and this results in the generation of axis parallel split which fails to capture the geometric structure of the data. Also, the standard random forest may not grow sufficiently large decision trees resulting in suboptimal performance. To capture the geometric properties and to grow the decision trees of sufficient depth, we propose oblique double random forest. The oblique double random forest models are multivariate decision trees. At each non-leaf node, multisurface proximal support vector machine generates the optimal plane for better generalization performance. Also, different regularization techniques (Tikhonov regularization, axis-parallel split regularization, Null space regularization) are employed for tackling the small sample size problems in the decision trees of oblique double random forest. The proposed ensembles of decision trees produce trees with bigger size compared to the standard ensembles of decision trees as bagging is used at each non-leaf node which results in improved performance. The evaluation of the baseline models and the proposed oblique and rotation double random forest models is performed on benchmark 121 UCI datasets and real-world fisheries datasets. Both statistical analysis and the experimental results demonstrate the efficacy of the proposed oblique and rotation double random forest models compared to the baseline models on the benchmark datasets.
- Klíčová slova
- Bootstrap, Decision tree, Double random forest, Ensemble learning, Oblique random forest, classification,
- MeSH
- algoritmy * MeSH
- analýza hlavních komponent MeSH
- rotace MeSH
- support vector machine * MeSH
- Publikační typ
- časopisecké články MeSH
Bearing degradation is the primary cause of electrical machine failures, making reliable condition monitoring essential to prevent breakdowns. This paper presents a novel hybrid model for the detection of multiple faults in bearings, combining Long Short-Term Memory (LSTM) networks with random forest (RF) classifiers, further enhanced by the Grey Wolf Optimization (GWO) algorithm. The proposed approach is structured in three stages: first, time and frequency domain features are manually extracted from vibration signals; second, these features are processed by a dual-layer LSTM network, which is specifically designed to capture complex temporal relationships within the data; finally, the GWO algorithm is employed to optimize feature selection from the LSTM outputs, feeding the most relevant features into the RF classifier for fault classification. The model was rigorously evaluated using a dataset comprising six distinct bearing health conditions: healthy, outer race fault, ball fault, inner race fault, compounded fault, and generalized degradation. The hybrid LSTM-RF-GWO model achieved a remarkable classification accuracy of 98.97%, significantly outperforming standalone models such as LSTM (93.56%) and RF (98.44%). Furthermore, the inclusion of GWO led to an additional accuracy improvement of 0.39% compared to the hybrid LSTM-RF model without optimization. Other performance metrics, including precision, kappa coefficient, false negative rate (FNR), and false positive rate (FPR), were also improved, with precision reaching 99.28% and the kappa coefficient achieving 99.13%. The FNR and FPR were reduced to 0.0071 and 0.0015, respectively, underscoring the model's effectiveness in minimizing misclassifications. The experimental results demonstrate that the proposed hybrid LSTM-RF-GWO framework not only enhances fault detection accuracy but also provides a robust solution for distinguishing between closely related fault conditions, making it a valuable tool for predictive maintenance in industrial applications.
- Klíčová slova
- Bearing fault detection, Feature selection, Grey wolf optimization, Hybrid model, LSTM, Machine learning, Random forest, Vibration signals,
- Publikační typ
- časopisecké články MeSH
To enhance our understanding of forest carbon sequestration, climate change mitigation and drought impact on forest ecosystems, the availability of high-resolution annual forest growth maps based on tree-ring width (TRW) would provide a significant advancement to the field. Site-specific characteristics, which can be approximated by high-resolution Earth observation by satellites (EOS), emerge as crucial drivers of forest growth, influencing how climate translates into tree growth. EOS provides information on surface reflectance related to forest characteristics and thus can potentially improve the accuracy of forest growth models based on TRW. Through the modelling of TRW using EOS, climate and topography data, we showed that species-specific models can explain up to 52 % of model variance (Quercus petraea), while combining different species results in relatively poor model performance (R2 = 13 %). The integration of EOS into models based solely on climate and elevation data improved the explained variance by 6 % on average. Leveraging these insights, we successfully generated a map of annual TRW for the year 2021. We employed the area of applicability (AOA) approach to delineate the range in which our models are deemed valid. The calculated AOA for the established forest-type models was 73 % of the study region, indicating robust spatial applicability. Notably, unreliable predictions predominantly occurred in the climate margins of our dataset. In conclusion, our large-scale assessment underscores the efficacy of combining climate, EOS and topographic data to develop robust models for mapping annual TRW. This research not only fills a critical void in the current understanding of forest growth dynamics but also highlights the potential of integrated data sources for comprehensive ecosystem assessments.
- Klíčová slova
- NDMI, NDRE, Random forest, Sentinel-1, Sentinel-2, Tree rings,
- MeSH
- ekosystém * MeSH
- klimatické změny MeSH
- lesy MeSH
- stromy MeSH
- technologie dálkového snímání * MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Evropa MeSH
- východní Evropa MeSH
Present-day disturbances are transforming European forest landscapes, and their legacies determine the vulnerability and resilience of the emergent forest generation. To understand these legacy effects, we investigated the resilience of the aboveground forest biomass (Babg) to a sequence of disturbances affecting the forest in different recovery phases from the initial large-scale impact. We used the model iLand to simulate windthrows that affected 13-24% of the Babg in a Central European forest landscape. An additional wind event was simulated 20, 40, 60, or 80 years after the initial impact (i.e., sequences of two windthrows were defined). Each windthrow triggered an outbreak of bark beetles that interacted with the recovery processes. We evaluated the resistance of the Babg to and recovery after the impact. Random Forest models were used to identify factors influencing resilience. We found that Babg resistance was the lowest 20 years after the initial impact when the increased proportion of emergent wind-exposed forest edges prevailed the disturbance-dampening effect of reduced biomass levels and increased landscape heterogeneity. This forest had a remarkably high recovery rate and reached the pre-disturbance Babg within 28 years. The forest exhibited a higher resistance and a slower recovery rate in the more advanced recovery phases, reaching the pre-disturbance Babg within 60-80 years. The recovery was enhanced by higher levels of alpha and beta diversity. Under elevated air temperature, the bark beetle outbreak triggered by windthrow delayed the recovery. However, the positive effect of increased temperature on forest productivity caused the recovery rate to be higher under the warming scenario than under the reference climate. We conclude that resilience is not a static property, but its magnitude and drivers vary in time, depending on vegetation feedbacks, interactions between disturbances, and climate. Understanding these mechanisms is an essential step towards the operationalization of resilience-oriented stewardship.
- Klíčová slova
- Central Europe, Climate change, Compound disturbance impacts, Engineering resilience, Forest aboveground biomass,
- MeSH
- biomasa MeSH
- brouci * růst a vývoj MeSH
- klimatické změny * MeSH
- lesy * MeSH
- vítr MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Evropa MeSH
Large-scale untargeted lipidomics experiments involve the measurement of hundreds to thousands of samples. Such data sets are usually acquired on one instrument over days or weeks of analysis time. Such extensive data acquisition processes introduce a variety of systematic errors, including batch differences, longitudinal drifts, or even instrument-to-instrument variation. Technical data variance can obscure the true biological signal and hinder biological discoveries. To combat this issue, we present a novel normalization approach based on using quality control pool samples (QC). This method is called systematic error removal using random forest (SERRF) for eliminating the unwanted systematic variations in large sample sets. We compared SERRF with 15 other commonly used normalization methods using six lipidomics data sets from three large cohort studies (832, 1162, and 2696 samples). SERRF reduced the average technical errors for these data sets to 5% relative standard deviation. We conclude that SERRF outperforms other existing methods and can significantly reduce the unwanted systematic variation, revealing biological variance of interest.
Simulating the carbon-water fluxes at more widely distributed meteorological stations based on the sparsely and unevenly distributed eddy covariance flux stations is needed to accurately understand the carbon-water cycle of terrestrial ecosystems. We established a new framework consisting of machine learning, determination coefficient (R2), Euclidean distance, and remote sensing (RS), to simulate the daily net ecosystem carbon dioxide exchange (NEE) and water flux (WF) of the Eurasian meteorological stations using a random forest model or/and RS. The daily NEE and WF datasets with RS-based information (NEE-RS and WF-RS) for 3774 and 4427 meteorological stations during 2002-2020 were produced, respectively. And the daily NEE and WF datasets without RS-based information (NEE-WRS and WF-WRS) for 4667 and 6763 meteorological stations during 1983-2018 were generated, respectively. For each meteorological station, the carbon-water fluxes meet accuracy requirements and have quasi-observational properties. These four carbon-water flux datasets have great potential to improve the assessments of the ecosystem carbon-water dynamics.
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.
- Klíčová slova
- Domain knowledge, Machine learning, Omics data, Random forest, Regularization, microRNA,
- MeSH
- genové regulační sítě MeSH
- lidé MeSH
- messenger RNA genetika MeSH
- mikro RNA genetika MeSH
- umělá inteligence MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- messenger RNA MeSH
- mikro RNA MeSH
Three-dimensional facial images are becoming more and more widespread. As such images provide more information about facial morphology than 2D imagery, they show great promise for use in future forensic applications, including age estimation and verification. This paper proposes an approach using random forests, a machine learning method, to develop and test models for classification of legal age thresholds (15 years and 18 years) using 3D facial landmarks. Our approach was developed on a set of 3D facial scans from 394 Czech individuals (194 males and 200 females) aged between 10 and 25 years. The dataset was retrieved from a sizable database of Central European faces - The FIDENTIS 3D Face Database. Three main types of input variables were processed using random forests: I) shape (size-invariant) coordinates of 3D landmarks, II) size and shape coordinates of 3D landmarks, and III) inter-landmark distances, angles and indices. The performance rates for the combinations of variables and age threshold were expressed in terms of sensitivity and specificity. The overall accuracy rates varied from 71.4%-91.5% (when the male and female samples were pooled). In general, higher accuracy was achieved for the age limit of 18 years than for 15 years. Whereas size-variant variables showed a better performance rate for the age limit of 15 years, the size-invariant variables (i.e., shape variables) were better for classifying individuals under 18 years. The verification models grounded on traditional variables (distances, angles, indices) yielded consistently higher performance rates on females than on males, whereas the inverse trend was observed for the models built on 3D coordinates. The results indicate that age verification based on 3D facial data with processing by the random forests method has high potential for further forensic or biometric applications.
- Klíčová slova
- 3D facial models, Age estimation, Age verification, FIDENTIS database, Random forests,
- MeSH
- anatomická značka * MeSH
- dítě MeSH
- dospělí MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- obličej anatomie a histologie MeSH
- počítačové zpracování obrazu MeSH
- průřezové studie MeSH
- strojové učení * MeSH
- určení kostního věku metody MeSH
- zobrazování trojrozměrné * MeSH
- Check Tag
- dítě MeSH
- dospělí MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
Tropical canopies are known for their high abundance and diversity of ants. However, the factors which enable coexistence of so many species in trees, and in particular, the role of foragers in determining local diversity, are not well understood. We censused nesting and foraging arboreal ant communities in two 0.32 ha plots of primary and secondary lowland rainforest in New Guinea and explored their species diversity and composition. Null models were used to test if the records of species foraging (but not nesting) in a tree were dependent on the spatial distribution of nests in surrounding trees. In total, 102 ant species from 389 trees occurred in the primary plot compared with only 50 species from 295 trees in the secondary forest plot. However, there was only a small difference in mean ant richness per tree between primary and secondary forest (3.8 and 3.3 sp. respectively) and considerably lower richness per tree was found only when nests were considered (1.5 sp. in both forests). About half of foraging individuals collected in a tree belonged to species which were not nesting in that tree. Null models showed that the ants foraging but not nesting in a tree are more likely to nest in nearby trees than would be expected at random. The effects of both forest stage and tree size traits were similar regardless of whether only foragers, only nests, or both datasets combined were considered. However, relative abundance distributions of species differed between foraging and nesting communities. The primary forest plot was dominated by native ant species, whereas invasive species were common in secondary forest. This study demonstrates the high contribution of foragers to arboreal ant diversity, indicating an important role of connectivity between trees, and also highlights the importance of primary vegetation for the conservation of native ant communities.
- MeSH
- biodiverzita * MeSH
- chování zvířat MeSH
- deštný prales MeSH
- ekosystém MeSH
- Formicidae * MeSH
- lesy * MeSH
- stromy * MeSH
- tropické klima * MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Nová Guinea MeSH
Terrestrial laser scanning is a powerful technology for capturing the three-dimensional structure of forests with a high level of detail and accuracy. Over the last decade, many algorithms have been developed to extract various tree parameters from terrestrial laser scanning data. Here we present 3D Forest, an open-source non-platform-specific software application with an easy-to-use graphical user interface with the compilation of algorithms focused on the forest environment and extraction of tree parameters. The current version (0.42) extracts important parameters of forest structure from the terrestrial laser scanning data, such as stem positions (X, Y, Z), tree heights, diameters at breast height (DBH), as well as more advanced parameters such as tree planar projections, stem profiles or detailed crown parameters including convex and concave crown surface and volume. Moreover, 3D Forest provides quantitative measures of between-crown interactions and their real arrangement in 3D space. 3D Forest also includes an original algorithm of automatic tree segmentation and crown segmentation. Comparison with field data measurements showed no significant difference in measuring DBH or tree height using 3D Forest, although for DBH only the Randomized Hough Transform algorithm proved to be sufficiently resistant to noise and provided results comparable to traditional field measurements.
- MeSH
- algoritmy MeSH
- automatizace MeSH
- lesy * MeSH
- zobrazování trojrozměrné * MeSH
- Publikační typ
- časopisecké články MeSH