Data/Machine learning

The Boosting Algorithm

DS-Lee 2021. 1. 12. 11:05

In this posting, we address the basic idea behind the various boosting algorithms. The easiest to understand is Adaboost, which proceeds as follows. (Note that the commonly used boosting algorithms are: Adaboost, gradient boosting, and stochastic gradient boosting which is the most common).

  1. Initialize $M$, the maximum number of models to be fit, and set the iteration counter $m=1$. Initialize the observation weights $w_i = 1/N$ for $i = 1,2, \cdots, N$. Initialize the ensemble model $\hat{F}_0 = 0$.
  2. Train a model using $\hat{f}_m$ using the observation weights $w_1, w_2, \cdots, w_N$ that minimizes the weighted error $e_m$ defined by summing the weights for the misclassified observations.
  3. Add the model to the ensemble: $\hat{F}_m = \hat{F}_{m-1} + \alpha_m \hat{f}_m$ where $\alpha_m = \log{\frac{1-e_m}{e_m}}$.
  4. Up[date the weights $w_1, w_2, \cdots, w_N$ so that the weights are increased for the observations that were misclassified. The size of the increase depends on $\alpha_m$ with larger values of $\alpha_m$ leading to bigger weights.
  5. Increment the model counter $m = m+1$. If $m \le M$, go to step 1.

The boosted estimate is given by:

$$ \hat{F} = \alpha_1 \hat{f_1} + \alpha_2 \hat{f_2} + \cdots + \alpha_M \hat{f}_M $$

By increasing the weights for the observations that were misclassified, the algorithm forces the models to train more heavily on the data for which it performed poorly. The factor $\alpha_m$ ensures that models with lower error have a bigger weight.

 

Detailed information can be found in StateQuest.