ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Naive Bayes Explained
    Mathematics 2021. 1. 4. 19:39

    Theory

    Before we get started, please memorize the notations used in this article:

    $X=(X1, X2, ..., X_k)$ represents $k$ features. $Y$ is the label with $K$ possible values (class).
    From a probabilistic perspective, $X_i \in X$ and $Y$ are random variables.
    The value of $X_i$ is $x$, and that of $Y$ is $y$.

    Basic Idea

    To make classifications, we need to use $X$ to predict $Y$. In other words, given a data point $X=(x_1, x_2, ..., x_k)$, what is the odd of $Y$ being $y$. This can be rewritten as the following equation:

    $$ P(Y=y | X=(x_1,x_2, ..., x_k)) $$

    $$classification = argmax_y P(Y=y|X=(x_1, x_2, ..., x_k))$$

    This is the basic idea of Naive Bayes, the rest of the algorithm I really more focusing on how to calculate the conditional probability above.

    Bayes Theorem

    Let's take a look at an example training dataset:

    Weather dataset

    We can formulate the classification for 'Play' given 'Outlook, Temperature, Humidity, and Windy' as follows:

    $$ P(Y|X) = \frac{P(X|Y) P(Y)}{P(X)} $$

    $$ Posterior = \frac{likelihood \times prior}{evidence} $$

    where $X = \{ X_1, X_2, X_3, X_4 \} =  \{$ Outlook, Temperature, Humidity, Windy $\}$, and $Y=y=$ Play. Then, it can be reformulated as:

    $$ P(Y|X) $$

    $$ = P(Y=y | X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4)$$

    $$ \approx P(X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4 | Y=y) P(Y=y) $$

    Here, $P(X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4 | Y=y)$ means the probability of $(Y=y)$ given $(X_1=x_1)$ and $(X_2=x_2)$ and $(X_3=x_3)$ and $(X_4=x_4)$, and the 'and' is interpreted as intersection $\cap$. If we assume that $X_1$, $X_2$, $X_3$, and $X_4$ are independent of each other, we can write:

    $$ P(X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4 | Y=y) $$

    $$ = P(X_1=x_1 | Y=y) P(X_2=x_2 | Y=y) P(X_3=x_3 | Y=y) P(X_4=x_4 | Y=y) $$

    With the simplified formula, the calculation becomes much easier. This is the Naive Bayes. The general form of te Naive Bayes is written as:

    $$ P(Y|X) = \frac{P(Y) \prod_i P(X_i | Y)}{P(X)} $$

    Continuous Data

    The Naive Bayes works only with categorical predictors (e.g., with spam classification, where presence or absence of words, phrases, characters, and so on, lies at the heart of the predictive task). For continuous features, there are essentially two choices: discretization and continuous Naive Bayes.

    Discretization works by breaking the data into categorical values. The simplest discretization is uniform binning, which creates bins with a fixed range. There are, of course, smarter and more complicated ways such as Recursive minimal entropy partitioning or SOM-based partitioning.

    The continuous Naive Bayes utilizes known distributions such as a normal distribution. The continuous Naive Bayes can be written as:

    $$ P(Y|X) = \frac{P(Y) \prod_i f(X_i | Y)}{P(X)} $$

    $f$ is the probability density function. If a feature $X_i$ can be assumed that it follows a normal distribution, it is fair to make such an assumption that the feature is noramlly distributed.

    The first step is estimating the mean and variance of the feature for a given label $y$. Let $S$ be data points with $Y=y$.

    $$ \mu^{\prime} = \frac{\sum^S{X_i}}{len(S)} $$

    $$ {\sigma^{\prime}}^2 = \frac{1}{len(S)-1} \sum^S (X_i - \mu^{\prime})^2 $$

    Now we can calculate the probability density $f(x)$

    $$ f(X_i=x | Y=y) = \frac{1}{\sqrt{2 \pi {\sigma^{\prime}}^2}} e^{\frac{(x-\mu^{\prime})^2}{2 { {\sigma^{\prime}}^2 }}} $$

     

    Source: towardsdatascience.com/naive-bayes-explained-9d2b96f4a9c0

    Comments