ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Cross Entropy
    Mathematics 2021. 1. 7. 11:36

    It measures dissimilarity between an actual probabilistic distribution and a predicted probabilistic distribution. Therefore, it's often used as a cost function for a classification neural network. 

    Decomposition

    CrossEntropy = LogSoftmax + NLLLoss

    Typically, at around the last layer of a classification model, class scores, $y_{c}$, are obtained. The class scores are converted into (log)-probabilities with a (log)-softmax, and they are used in the NLLLoss (Negative Log-Likelihood Loss).

    Equation

    Equation of the cross entropy is as follows:

    $$H(p,q) = - \sum_{c}{ q(y_c) \log(p(y_c)) }  \tag{1}$$

    where $p(y_c)$ is a predicted probability of $y_c$ obtained by a predictive model, $q(y_c)$ is an actual probability, and $c$ denotes a class.

    Example

    Let's say, a predictive classification model is designed to classify between ${cat, dog, bird}$. Given one cat image as input, the model outputs $\hat{y}=\{0.8, 0.1, 0.1\}$. Then,

    $$q(cat, dog, bird) = \{1.0, 0.0, 0.0\}$$

    $$p(cat, dog, bird) = \{0.8, 0.1, 0.1\}$$

    Then, we can compute the cross entropy:

    $$H(p,q) = -[ 1.0\log(0.8)  + 0.0\log(0.1) + 0.0\log(0.1)] = 0.22$$

    But what if $\hat{y}=\{0.4, 0.3, 0.3\}$ (in this case, the prediction performance became worse):

    $$H(p,q) = -[ 1.0\log(0.4)  + 0.0\log(0.3) + 0.0\log(0.3)] = 0.92$$

    Remember that the objective function of a classification neural network model is to minimize the cost function (= cross entropy).

    Implementation Tip/Trick

    In a training dataset, true logits are $Y \in \mathbb{R}^{K}$ where $K$ denotes a number of classes, and all the elements are zero except for the element corresponding to the true class $y_{c^+}$ as one. Then, $Eq.(1)$ can be reduced to:

    $$ H(p,q) = - 1 \cdot \mathrm{log}( p(y_{c^+}) )  \tag{2}$$

    Note that $p(y_{c^+})$ can be expressed in the log-softmax form as explained here. The above form is used implement PyTorch's negative log likelihood loss (NLL loss). PyTorch's cross-entropy loss is a combination of the log-softmax and the NLL loss (Pytorch implementation of the cross-entropy loss).

    Since, $p(y_{c^+})$ is a softmaxed-value, it can be represented as $\frac{ \exp\{y_{c^+}\} }{ \sum_c{ \exp\{y_c\}} }$, then $Eq.(2)$ can be re-written as:

    $$ H(p,q) = - \log\left( \frac{ \exp\{y_{c^+}\} }{ \sum_c{ \exp\{y_c\}} } \right)$$

    'Mathematics' 카테고리의 다른 글

    Multivariate Gaussian Distribution  (0) 2021.01.07
    Impurity Metric - Gini, Entropy  (0) 2021.01.05
    Naive Bayes Explained  (0) 2021.01.04

    Comments