random forest
Dotaz
Zobrazit nápovědu
Random Forest is an ensemble of decision trees based on the bagging and random subspace concepts. As suggested by Breiman, the strength of unstable learners and the diversity among them are the ensemble models' core strength. In this paper, we propose two approaches known as oblique and rotation double random forests. In the first approach, we propose rotation based double random forest. In rotation based double random forests, transformation or rotation of the feature space is generated at each node. At each node different random feature subspace is chosen for evaluation, hence the transformation at each node is different. Different transformations result in better diversity among the base learners and hence, better generalization performance. With the double random forest as base learner, the data at each node is transformed via two different transformations namely, principal component analysis and linear discriminant analysis. In the second approach, we propose oblique double random forest. Decision trees in random forest and double random forest are univariate, and this results in the generation of axis parallel split which fails to capture the geometric structure of the data. Also, the standard random forest may not grow sufficiently large decision trees resulting in suboptimal performance. To capture the geometric properties and to grow the decision trees of sufficient depth, we propose oblique double random forest. The oblique double random forest models are multivariate decision trees. At each non-leaf node, multisurface proximal support vector machine generates the optimal plane for better generalization performance. Also, different regularization techniques (Tikhonov regularization, axis-parallel split regularization, Null space regularization) are employed for tackling the small sample size problems in the decision trees of oblique double random forest. The proposed ensembles of decision trees produce trees with bigger size compared to the standard ensembles of decision trees as bagging is used at each non-leaf node which results in improved performance. The evaluation of the baseline models and the proposed oblique and rotation double random forest models is performed on benchmark 121 UCI datasets and real-world fisheries datasets. Both statistical analysis and the experimental results demonstrate the efficacy of the proposed oblique and rotation double random forest models compared to the baseline models on the benchmark datasets.
- MeSH
- algoritmy * MeSH
- analýza hlavních komponent MeSH
- rotace MeSH
- support vector machine * MeSH
- Publikační typ
- časopisecké články MeSH
Tropical canopies are known for their high abundance and diversity of ants. However, the factors which enable coexistence of so many species in trees, and in particular, the role of foragers in determining local diversity, are not well understood. We censused nesting and foraging arboreal ant communities in two 0.32 ha plots of primary and secondary lowland rainforest in New Guinea and explored their species diversity and composition. Null models were used to test if the records of species foraging (but not nesting) in a tree were dependent on the spatial distribution of nests in surrounding trees. In total, 102 ant species from 389 trees occurred in the primary plot compared with only 50 species from 295 trees in the secondary forest plot. However, there was only a small difference in mean ant richness per tree between primary and secondary forest (3.8 and 3.3 sp. respectively) and considerably lower richness per tree was found only when nests were considered (1.5 sp. in both forests). About half of foraging individuals collected in a tree belonged to species which were not nesting in that tree. Null models showed that the ants foraging but not nesting in a tree are more likely to nest in nearby trees than would be expected at random. The effects of both forest stage and tree size traits were similar regardless of whether only foragers, only nests, or both datasets combined were considered. However, relative abundance distributions of species differed between foraging and nesting communities. The primary forest plot was dominated by native ant species, whereas invasive species were common in secondary forest. This study demonstrates the high contribution of foragers to arboreal ant diversity, indicating an important role of connectivity between trees, and also highlights the importance of primary vegetation for the conservation of native ant communities.
- MeSH
- biodiverzita * MeSH
- chování zvířat MeSH
- deštný prales MeSH
- ekosystém MeSH
- Formicidae * MeSH
- lesy * MeSH
- stromy * MeSH
- tropické klima * MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Nová Guinea MeSH
Terrestrial laser scanning is a powerful technology for capturing the three-dimensional structure of forests with a high level of detail and accuracy. Over the last decade, many algorithms have been developed to extract various tree parameters from terrestrial laser scanning data. Here we present 3D Forest, an open-source non-platform-specific software application with an easy-to-use graphical user interface with the compilation of algorithms focused on the forest environment and extraction of tree parameters. The current version (0.42) extracts important parameters of forest structure from the terrestrial laser scanning data, such as stem positions (X, Y, Z), tree heights, diameters at breast height (DBH), as well as more advanced parameters such as tree planar projections, stem profiles or detailed crown parameters including convex and concave crown surface and volume. Moreover, 3D Forest provides quantitative measures of between-crown interactions and their real arrangement in 3D space. 3D Forest also includes an original algorithm of automatic tree segmentation and crown segmentation. Comparison with field data measurements showed no significant difference in measuring DBH or tree height using 3D Forest, although for DBH only the Randomized Hough Transform algorithm proved to be sufficiently resistant to noise and provided results comparable to traditional field measurements.
- MeSH
- algoritmy MeSH
- automatizace MeSH
- lesy * MeSH
- zobrazování trojrozměrné * MeSH
- Publikační typ
- časopisecké články MeSH
Political action can reduce introductions of diseases caused by invasive forest pathogens (IPs) and public support is important for effective prevention. The public's awareness of IP problems and the acceptability of policies aiming to combat these pathogens were surveyed in nine European countries (N = 3469). Although awareness of specific diseases (e.g., ash dieback) varied, problem awareness and policy acceptability were similar across countries. The public was positive towards policies for informational measures and stricter standards for plant production, but less positive towards restricting public access to protected areas. Multilevel models, including individual and country level variables, revealed that media exposure was positively associated with awareness of IP problems, and strengthened the link between problem awareness and policy acceptability. Results suggest that learning about IPs through the media and recognizing the associated problems increase policy acceptability. Overall, the study elaborates on the anthropogenic dimension of diseases caused by IPs.
- MeSH
- lesy * MeSH
- postup * MeSH
- průzkumy a dotazníky MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Evropa MeSH
Three-dimensional facial images are becoming more and more widespread. As such images provide more information about facial morphology than 2D imagery, they show great promise for use in future forensic applications, including age estimation and verification. This paper proposes an approach using random forests, a machine learning method, to develop and test models for classification of legal age thresholds (15 years and 18 years) using 3D facial landmarks. Our approach was developed on a set of 3D facial scans from 394 Czech individuals (194 males and 200 females) aged between 10 and 25 years. The dataset was retrieved from a sizable database of Central European faces - The FIDENTIS 3D Face Database. Three main types of input variables were processed using random forests: I) shape (size-invariant) coordinates of 3D landmarks, II) size and shape coordinates of 3D landmarks, and III) inter-landmark distances, angles and indices. The performance rates for the combinations of variables and age threshold were expressed in terms of sensitivity and specificity. The overall accuracy rates varied from 71.4%-91.5% (when the male and female samples were pooled). In general, higher accuracy was achieved for the age limit of 18 years than for 15 years. Whereas size-variant variables showed a better performance rate for the age limit of 15 years, the size-invariant variables (i.e., shape variables) were better for classifying individuals under 18 years. The verification models grounded on traditional variables (distances, angles, indices) yielded consistently higher performance rates on females than on males, whereas the inverse trend was observed for the models built on 3D coordinates. The results indicate that age verification based on 3D facial data with processing by the random forests method has high potential for further forensic or biometric applications.
- MeSH
- anatomická značka * MeSH
- dítě MeSH
- dospělí MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- obličej anatomie a histologie MeSH
- počítačové zpracování obrazu MeSH
- průřezové studie MeSH
- strojové učení * MeSH
- určení kostního věku metody MeSH
- zobrazování trojrozměrné * MeSH
- Check Tag
- dítě MeSH
- dospělí MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.
- MeSH
- genové regulační sítě MeSH
- lidé MeSH
- messenger RNA genetika MeSH
- mikro RNA genetika MeSH
- umělá inteligence MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The surface organic horizons in forest soils have been affected by air and soil pollutants, including potentially toxic elements (PTEs). Monitoring of PTEs requires a large number of samples and adequate analysis. Visible-near infrared (vis-NIR: 350-2500 nm) spectroscopy provides an alternative method to conventional laboratory measurements, which are time-consuming and expensive. However, vis-NIR spectroscopy relies on an empirical calibration of the target attribute to the spectra. This study examined the capability of vis-NIR spectra coupled with machine learning (ML) techniques (partial least squares regression (PLSR), support vector machine regression (SVMR), and random forest (RF)) and a deep learning (DL) approach called fully connected neural network (FNN) to assess selected PTEs (Cr, Cu, Pb, Zn, and Al) in forest organic horizons. The dataset consists of 2160 samples from 1080 sites in the forests over all the Czech Republic. At each site, we collected two samples from the fragmented (F) and humus (H) organic layers. The content of all PTEs was higher in horizon H compared to F horizon. Our results indicate that the reflectance of samples tended to decrease with increased PTEs concentration. Cr was the most accurately predicted element, regardless of the algorithm used. SVMR provided the best results for assessing the H horizon (R2 = 0.88 and RMSE = 3.01 mg/kg for Cr). FNN produced the best predictions of Cr in the combined F + H layers (R2 = 0.89 and RMSE = 2.95 mg/kg) possibly due to the larger number of samples. In the F horizon, the PTEs were not predicted adequately. The study shows that PTEs in forest soils of the Czech Republic can be accurately estimated with vis-NIR spectra and ML approaches. Results hint in availability of a large sample size, FNN provides better results.
- MeSH
- algoritmy MeSH
- látky znečišťující půdu * MeSH
- neuronové sítě MeSH
- půda * MeSH
- Publikační typ
- časopisecké články MeSH
- Geografické názvy
- Česká republika MeSH
INTRODUCTION: This study explores the emotional impact of virtual forest therapy delivered through audio-visual recordings shown to patients in the oncology waiting rooms, focusing on whether simulated forest walks can positively influence patients' emotional states compared to traditional waiting room stimuli. METHODS: The study involved 117 participants from a diverse group of oncology patients in the outpatient clinic waiting room at the Masaryk Memorial Cancer Institute. Using a partially randomized controlled trial design, the study assessed basic emotional dimensions-valence and arousal-as well as specific psychological states such as thought control, sadness, anxiety, and pain. This assessment used the Self-Assessment Manikin and the modified Emotional Thermometer before and after participants watched three video types (forest, sea, news). Baseline stress levels were measured using the Kessler Psychological Distress Scale (K6). RESULTS: Participants exposed to forest and sea videos reported significant improvements in emotional valence and reduced arousal, suggesting a calming and uplifting effect. No significant changes were observed in the control and news groups. Secondary outcomes related to anxiety, sadness, and pain showed no significant interaction effects, though small but significant main effects of time on these variables were noted. DISCUSSION: The findings suggest that videos of forest and sea can be a beneficial intervention in the oncology waiting rooms by enhancing patients' emotional well-being. This pilot study underscores the potential for integrating virtual mental health support elements into healthcare settings to improve patient care experience.
- Publikační typ
- časopisecké články MeSH
MOTIVATION: Accurate genotyping of DNA from a single cell is required for applications such as de novo mutation detection, linkage analysis and lineage tracing. However, achieving high precision genotyping in the single-cell environment is challenging due to the errors caused by whole-genome amplification. Two factors make genotyping from single cells using single nucleotide polymorphism (SNP) arrays challenging. The lack of a comprehensive single-cell dataset with a reference genotype and the absence of genotyping tools specifically designed to detect noise from the whole-genome amplification step. Algorithms designed for bulk DNA genotyping cause significant data loss when used for single-cell applications. RESULTS: In this study, we have created a resource of 28.7 million SNPs, typed at high confidence from whole-genome amplified DNA from single cells using the Illumina SNP bead array technology. The resource is generated from 104 single cells from two cell lines that are available from the Coriell repository. We used mother-father-proband (trio) information from multiple technical replicates of bulk DNA to establish a high quality reference genotype for the two cell lines on the SNP array. This enabled us to develop SureTypeSC-a two-stage machine learning algorithm that filters a substantial part of the noise, thereby retaining the majority of the high quality SNPs. SureTypeSC also provides a simple statistical output to show the confidence of a particular single-cell genotype using Bayesian statistics. AVAILABILITY AND IMPLEMENTATION: The implementation of SureTypeSC in Python and sample data are available in the GitHub repository: https://github.com/puko818/SureTypeSC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
A central challenge in ecology is to understand the relative importance of processes that shape diversity patterns. Compared with aboveground biota, little is known about spatial patterns and processes in soil organisms. Here we examine the spatial structure of communities of small soil eukaryotes to elucidate the underlying stochastic and deterministic processes in the absence of environmental gradients at a local scale. Specifically, we focus on the fine-scale spatial autocorrelation of prominent taxonomic and functional groups of eukaryotic microbes. We collected 123 soil samples in a nested design at distances ranging from 0.01 to 64 m from three boreal forest sites and used 454 pyrosequencing analysis of Internal Transcribed Spacer for detecting Operational Taxonomic Units of major eukaryotic groups simultaneously. Among the main taxonomic groups, we found significant but weak spatial variability only in the communities of Fungi and Rhizaria. Within Fungi, ectomycorrhizas and pathogens exhibited stronger spatial structure compared with saprotrophs and corresponded to vegetation. For the groups with significant spatial structure, autocorrelation occurred at a very fine scale (<2 m). Both dispersal limitation and environmental selection had a weak effect on communities as reflected in negative or null deviation of communities, which was also supported by multivariate analysis, that is, environment, spatial processes and their shared effects explained on average <10% of variance. Taken together, these results indicate a random distribution of soil eukaryotes with respect to space and environment in the absence of environmental gradients at the local scale, reflecting the dominant role of drift and homogenizing dispersal.
- MeSH
- ekologie MeSH
- Eukaryota klasifikace MeSH
- houby klasifikace genetika izolace a purifikace MeSH
- lesy MeSH
- půdní mikrobiologie * MeSH
- společenstvo * MeSH
- stromy mikrobiologie MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Estonsko MeSH