Cross Entropy

Mathematics 2021. 1. 7. 11:36

It measures dissimilarity between an actual probabilistic distribution and a predicted probabilistic distribution. Therefore, it's often used as a cost function for a classification neural network.

Decomposition

CrossEntropy = LogSoftmax + NLLLoss

Typically, at around the last layer of a classification model, class scores, $y_{c}$, are obtained. The class scores are converted into (log)-probabilities with a (log)-softmax, and they are used in the NLLLoss (Negative Log-Likelihood Loss).

Equation

Equation of the cross entropy is as follows:

$$H(p,q) = - \sum_{c}{ q(y_c) \log(p(y_c)) } \tag{1}$$

where $p(y_c)$ is a predicted probability of $y_c$ obtained by a predictive model, $q(y_c)$ is an actual probability, and $c$ denotes a class.

Example

Let's say, a predictive classification model is designed to classify between ${cat, dog, bird}$. Given one cat image as input, the model outputs $\hat{y}=\{0.8, 0.1, 0.1\}$. Then,

$$q(cat, dog, bird) = \{1.0, 0.0, 0.0\}$$

$$p(cat, dog, bird) = \{0.8, 0.1, 0.1\}$$

Then, we can compute the cross entropy:

$$H(p,q) = -[ 1.0\log(0.8) + 0.0\log(0.1) + 0.0\log(0.1)] = 0.22$$

But what if $\hat{y}=\{0.4, 0.3, 0.3\}$ (in this case, the prediction performance became worse):

$$H(p,q) = -[ 1.0\log(0.4) + 0.0\log(0.3) + 0.0\log(0.3)] = 0.92$$

Remember that the objective function of a classification neural network model is to minimize the cost function (= cross entropy).

Implementation Tip/Trick

In a training dataset, true logits are $Y \in \mathbb{R}^{K}$ where $K$ denotes a number of classes, and all the elements are zero except for the element corresponding to the true class $y_{c^+}$ as one. Then, $Eq.(1)$ can be reduced to:

$$ H(p,q) = - 1 \cdot \mathrm{log}( p(y_{c^+}) ) \tag{2}$$

Note that $p(y_{c^+})$ can be expressed in the log-softmax form as explained here. The above form is used implement PyTorch's negative log likelihood loss (NLL loss). PyTorch's cross-entropy loss is a combination of the log-softmax and the NLL loss (Pytorch implementation of the cross-entropy loss).

Since, $p(y_{c^+})$ is a softmaxed-value, it can be represented as $\frac{ \exp\{y_{c^+}\} }{ \sum_c{ \exp\{y_c\}} }$, then $Eq.(2)$ can be re-written as:

$$ H(p,q) = - \log\left( \frac{ \exp\{y_{c^+}\} }{ \sum_c{ \exp\{y_c\}} } \right)$$

저작자표시

'Mathematics' 카테고리의 다른 글

Multivariate Gaussian Distribution (0)	2021.01.07
Impurity Metric - Gini, Entropy (0)	2021.01.05
Naive Bayes Explained (0)	2021.01.04

Comments

ABOUT ME

Daesoo Lee's Blog

Decomposition

Equation

Example

Implementation Tip/Trick

'Mathematics' 카테고리의 다른 글

티스토리툴바