-
Cross EntropyMathematics 2021. 1. 7. 11:36
It measures dissimilarity between an actual probabilistic distribution and a predicted probabilistic distribution. Therefore, it's often used as a cost function for a classification neural network. Decomposition CrossEntropy = LogSoftmax + NLLLoss Typically, at around the last layer of a classification model, class scores, $y_{c}$, are obtained. The class scores are converted into (log)-probabil..
-
Impurity Metric - Gini, EntropyMathematics 2021. 1. 5. 15:40
The two most common matrices are 1) Gini impurity, 2) Entropy Gini Impurity Binary case $ I(D) = Gini(D) = pq = p(1-p) $ $D$ denotes dataset, and $c$ denotes class. General case $ I(D) = 1 - \sum_{i=1}^{c}{p_i}^2 $ Entropy Binary case $ I(D) = -p\log_2{p} -q\log_2{q} $ General case $ I(D) = 1 - \sum_i^c{p_i \log_2{p_i}} $ Gini Impurity v.s Entropy The figure shows that Gini impurity (rescaled) a..
-
Naive Bayes ExplainedMathematics 2021. 1. 4. 19:39
Theory Before we get started, please memorize the notations used in this article: $X=(X1, X2, ..., X_k)$ represents $k$ features. $Y$ is the label with $K$ possible values (class). From a probabilistic perspective, $X_i \in X$ and $Y$ are random variables. The value of $X_i$ is $x$, and that of $Y$ is $y$. Basic Idea To make classifications, we need to use $X$ to predict $Y$. In other words, giv..
-
Prediction Interval and Confidence IntervalMathematics 2021. 1. 4. 14:17
Prediction Interval (PI) a prediction interval reflects the uncertainty around a single value. Confidence Interval (CI) a confidence interval reflects the uncertainty around the mean prediction values. PI v.s CI How to obtain the Confidence Interval Get a bootstrap sample from the existing dataset. Fit a regression model to the bootstrap sample, and record the estimated coefficients. Repeat step..
-
Leverage, Influential values, Cook's distance in RegressionMathematics 2021. 1. 2. 21:31
Leverage and Influential Values The above image is from this youtube video. The detailed information including its formula is introduced in this video. Cook's Distance It measures the influence of a data point. Cook's distance $D_i$ of observation $i$ for ($i=1,\cdots,n$) is defined as the sum of all the changes in the regression model when observation $i$ is removed from it. $$D_i = \frac{\sum_..
-
Power and Sample sizeMathematics 2021. 1. 1. 11:24
Effect Size The minimum size of the effect that you hope to be able to detect in a statistical test, such as "a 20% improvement in click rates". Power = Probability of correctly rejecting a null hypothesis $H_0$. = Probability of detecting a given effect size with given sample size. Then, $H_1$, here, is the probability of correctly rejecting $H_0$ where $H_0$ is that there is no improvement (= ..
-
ANOVA (Analysis of Variance)Mathematics 2020. 12. 31. 10:04
One-way ANOVA Suppose that, instead of an A/B test, we had a comparison of multiple groups, say A-B-C-D, each with numeric data. The statistical procedure that tests for a statistically significant difference among the multiple groups is called analysis of variance, or ANOVA. The typical hypotheses for the ANOVA are: Null hypothesis $H_0$: all the distributions are from the same (common) populat..
-
F-ratio (F-statistic), F-distribution, and F-testMathematics 2020. 12. 30. 15:26
We need the $F$-ratio ($F$-statistic) and $F$ distribution to compare two random variable's variability. Thus, the question we're trying to answer using these is "Are two variances $\sigma_x$ and $\sigma_y$ from the same population?". The $F$-ratio measures the variability: $$F=\frac{s_x^2}{s_y^2}$$ where $s_x^2$ and $s_y^2$ denote larger sample variance and smaller sample variance, respectively..