-
Cross EntropyMathematics 2021. 1. 7. 11:36
It measures dissimilarity between an actual probabilistic distribution and a predicted probabilistic distribution. Therefore, it's often used as a cost function for a classification neural network.
Decomposition
CrossEntropy = LogSoftmax + NLLLoss
Typically, at around the last layer of a classification model, class scores, $y_{c}$, are obtained. The class scores are converted into (log)-probabilities with a (log)-softmax, and they are used in the NLLLoss (Negative Log-Likelihood Loss).
Equation
Equation of the cross entropy is as follows:
$$H(p,q) = - \sum_{c}{ q(y_c) \log(p(y_c)) } \tag{1}$$
where $p(y_c)$ is a predicted probability of $y_c$ obtained by a predictive model, $q(y_c)$ is an actual probability, and $c$ denotes a class.
Example
Let's say, a predictive classification model is designed to classify between ${cat, dog, bird}$. Given one cat image as input, the model outputs $\hat{y}=\{0.8, 0.1, 0.1\}$. Then,
$$q(cat, dog, bird) = \{1.0, 0.0, 0.0\}$$
$$p(cat, dog, bird) = \{0.8, 0.1, 0.1\}$$
Then, we can compute the cross entropy:
$$H(p,q) = -[ 1.0\log(0.8) + 0.0\log(0.1) + 0.0\log(0.1)] = 0.22$$
But what if $\hat{y}=\{0.4, 0.3, 0.3\}$ (in this case, the prediction performance became worse):
$$H(p,q) = -[ 1.0\log(0.4) + 0.0\log(0.3) + 0.0\log(0.3)] = 0.92$$
Remember that the objective function of a classification neural network model is to minimize the cost function (= cross entropy).
Implementation Tip/Trick
In a training dataset, true logits are $Y \in \mathbb{R}^{K}$ where $K$ denotes a number of classes, and all the elements are zero except for the element corresponding to the true class $y_{c^+}$ as one. Then, $Eq.(1)$ can be reduced to:
$$ H(p,q) = - 1 \cdot \mathrm{log}( p(y_{c^+}) ) \tag{2}$$
Note that $p(y_{c^+})$ can be expressed in the log-softmax form as explained here. The above form is used implement PyTorch's negative log likelihood loss (NLL loss). PyTorch's cross-entropy loss is a combination of the log-softmax and the NLL loss (Pytorch implementation of the cross-entropy loss).
Since, $p(y_{c^+})$ is a softmaxed-value, it can be represented as $\frac{ \exp\{y_{c^+}\} }{ \sum_c{ \exp\{y_c\}} }$, then $Eq.(2)$ can be re-written as:
$$ H(p,q) = - \log\left( \frac{ \exp\{y_{c^+}\} }{ \sum_c{ \exp\{y_c\}} } \right)$$
'Mathematics' 카테고리의 다른 글
Multivariate Gaussian Distribution (0) 2021.01.07 Impurity Metric - Gini, Entropy (0) 2021.01.05 Naive Bayes Explained (0) 2021.01.04