Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

Algorithms

Random Forest

Users: 2 - Average Rating: 5.00


Random Forest (RF) is a combination of tree predictors.
The main features of RF are the following:
- From the training set, new training subsets are drawn with replacement (bagging) and on each of these sets a tree is grown. For each tree, a small group of input variables are randomly sampled as candidates at each split. The randomness used in tree construction has the aim to obtain low correlation between the trees.
- The generalization error for forests converges to a limit as the number of trees in the RF becomes large.
- The generalization error of RF mostly depends on the strength of the individual trees in the forest and the correlation between them. To improve accuracy, you have to minimize the correlation while maintaining the accuracy of the single trees.
- The main hyper-parameter affecting the RF’s performance is the number of variables randomly sampled as candidates at each split which affects correlation between trees and accuracy of the single tree. Common values are the square root/log/one third of the number of input features.
- Other important parameters are the number of trees in the forest (this should not be set to too small a number, to ensure that every input row gets predicted at least a few times), and the minimum size of terminal nodes (how much growing each tree).

Scientific Area:
R

Language/Environments:
Learning

Target Group:
Basic


Cite as:
Breiman, L., Random forests, Machine learning 45.1 (2001): 5-32.

Author of the review:
Giulia Cademartori
University of Genoa


Reviews

You have to login to leave a comment. If you are not registered click here

Pablo Guerrero-Garcia


It turns out that random forests can be used to develop a recommender system (RS). Hence it would be nice to also get a metric as could be RMSE or MAE for their throughtput to be directly comparable with that of collaborative filtering RSs. Finally, random forest are not exclusively tied to the R programming language, because I think it's also included in a Matlab/Octave toolbox, cf. https://es.mathworks.com/help/stats/select-predictors-for-random-forests.html

Simone Minisi


RF is a widely used supervised Machine Learning algorithm which in most cases can be a good starting point to try out your data!