Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

Algorithms

Random Forest

Random Forest (RF) is a combination of tree predictors.
The main features of RF are the following:
- From the training set, new training subsets are drawn with replacement (bagging) and on each of these sets a tree is grown. For each tree, a small group of input variables are randomly sampled as candidates at each split. The randomness used in tree construction has the aim to obtain low correlation between the trees.
- The generalization error for forests converges to a limit as the number of trees in the RF becomes large.
- The generalization error of RF mostly depends on the strength of the individual trees in the forest and the correlation between them. To improve accuracy, you have to minimize the correlation while maintaining the accuracy of the single trees.
- The main hyper-parameter affecting the RF’s performance is the number of variables randomly sampled as candidates at each split which affects correlation between trees and accuracy of the single tree. Common values are the square root/log/one third of the number of input features.
- Other important parameters are the number of trees in the forest (this should not be set to too small a number, to ensure that every input row gets predicted at least a few times), and the minimum size of terminal nodes (how much growing each tree).

Scientific Area:
R

Language/Environments:
Learning

Target Group:
Basic


Cite as:
Breiman, L., Random forests, Machine learning 45.1 (2001): 5-32.

Author of the review:
Giulia Cademartori
University of Genoa


Reviews

You have to login to leave a comment. If you are not registered click here