Mathematics

Impurity Metric - Gini, Entropy

DS-Lee 2021. 1. 5. 15:40

The two most common matrices are 1) Gini impurity, 2) Entropy

Gini Impurity

Binary case

$ I(D) = Gini(D) = pq = p(1-p) $

$D$ denotes dataset, and $c$ denotes class.

General case

$ I(D) = 1 - \sum_{i=1}^{c}{p_i}^2 $

Entropy

Binary case

$ I(D) = -p\log_2{p} -q\log_2{q} $

General case

$ I(D) = 1 - \sum_i^c{p_i \log_2{p_i}} $

Gini Impurity v.s Entropy

The figure shows that Gini impurity (rescaled) and the entropy measures are similar, with entropy giving higher impurity scores for moderate and high misclassification error.