Mathematics
Impurity Metric - Gini, Entropy
DS-Lee
2021. 1. 5. 15:40
The two most common matrices are 1) Gini impurity, 2) Entropy
Gini Impurity
Binary case
$ I(D) = Gini(D) = pq = p(1-p) $
$D$ denotes dataset, and $c$ denotes class.
General case
$ I(D) = 1 - \sum_{i=1}^{c}{p_i}^2 $
Entropy
Binary case
$ I(D) = -p\log_2{p} -q\log_2{q} $
General case
$ I(D) = 1 - \sum_i^c{p_i \log_2{p_i}} $
Gini Impurity v.s Entropy
The figure shows that Gini impurity (rescaled) and the entropy measures are similar, with entropy giving higher impurity scores for moderate and high misclassification error.