The main features are the following:
• It is a tree ensemble model used for supervised learning problems which consists of a set of classification and regression trees (CART). A CART is a bit different from decision trees, in which the leaf only contains decision values. In CART, a real score is associated with each of the leaves, which gives us richer interpretations that go beyond classification.
• The main difference with random forest is that, instead of having all the trees learning at once, we use an additive strategy: fix what we have learned and add one new tree at a time.
Its main hyper-parameters are the following:
• Eta(η): step size shrinkage used in update to prevents overfitting. After each boosting step, newly added weights are scaled by a factor of η. Shrinkage reduces the influence of each individual tree and leaves space for future trees to improve the model.
• Gamma: minimum loss reduction required to make a further partition on a leaf node of the tree. The larger gamma is, the more conservative the algorithm will be.
• Max_depth: maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit and it will consume more memory.
• Min_child_weight: Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning.
• Subsample: fraction of the training set that can be used to train each tree. If this value is low, it may lead to underfitting or if it is too high, it may lead to overfitting.
• Colsample: this parameter decides if apply column’s, i.e., features’, subsampling or not when constructing a tree and at what point (once at the beginning, for every new depth level reached in a tree, every time a new split is evaluated).
• Lambda: L2 regularization term on weights.
• Alpha: L1 regularization term on weights (default 0).
• Tree_method: the tree construction algorithm used in XGBoost. Default value is auto: for small dataset, choose greedy algorithm; for larger dataset, approximate algorithm.
• Scale_pos_weight: a value greater than 0 should be used in case of high class-imbalance as it helps in faster convergence.
• Objective: specify the learning task and the corresponding learning objective function (ex. regression with squared loss, multiclass classification using the softmax objective, logistic regression).