ANOVA (Analysis of Variance)
One-way ANOVA
Suppose that, instead of an A/B test, we had a comparison of multiple groups, say A-B-C-D, each with numeric data. The statistical procedure that tests for a statistically significant difference among the multiple groups is called analysis of variance, or ANOVA.
The typical hypotheses for the ANOVA are:
- Null hypothesis $H_0$: all the distributions are from the same (common) population.
- Alternative hypothesis $H_1$: at least, one of the distributions are from a different population
For example,
There are three groups and their distributions are distributed as follows:
It is obvious that the distribution#3 is likely to come from a different population. Then how do we measure that? The basic concept for the measurement is as follows:
$$\frac{variability \: between \: the \: means}{variability \: within \: the \: distributions}$$
$$ \approx \frac{distance \: from \: overall \: mean}{internal \: spread}$$
$$ \approx \frac{Variance \: Between}{Variance \: Within}$$
$$ \approx F-ratio$$
With $F$-ratio, we can conduct the $F$-tset to test our hypothesis.
※ Detailed tutorials about the ANOVA are presented in the following Youtube videos:
- ANOVA, A Visual Introduction
- One-way ANOVA, A Visual Tutorial
- One-way ANOVA, Understanding the Calculation
Resampling Approach for ANOVA
The approach introduced above is the traditional formula approach. Instead, we can use the resampling approach for ANOVA, which resembles the basic permutation test. The basis for it can be seen in the following resampling procedure:
Let's say we have the following dataset:
The procedure is:
- Combine all the data together in a single box.
- Shuffle and draw out four resamples (four groups) of five values (records) each. (like bootstraping)
- Record the mean of each of the four groups.
- Record the variance among the four group means.
- Repeat steps 2-4 many times (say 10,000)
What proportion of the time did the resampled variance exceed the observed variance? This is the $p$-value we do the hypothesis test with. This type of permutation test is a bit more involved than the basic permutation test.
Two-way ANOVA
Two-way ANOVA is an extension of the one-way ANOVA. The tutorial videos can be found in the following links: