ANOVA : ANOVA Sittie Jalilah T. Abdul-Jalil
Slide 2: Analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance is partitioned into components due to different sources of variation. In its simplest form ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes means.
Slide 3: There are three conceptual classes of such models:
Fixed-effects models assume that the data came from normal populations which may differ only in their means. (Model 1)
Random effects models assume that the data describe a hierarchy of different populations whose differences are constrained by the hierarchy. (Model 2)
Mixed-effect models describe situations where both fixed and random effects are present. (Model 3)
Slide 4: One-way ANOVA is used to test for differences among two or more independent groups. Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test (Gosset, 1908). When there are only two means to compare, the t-test and the F-test are equivalent; the relation between ANOVA and t is given by F = t2.
Slide 5: F-test or the ANOVA (Analysis of Variance)
(For more variables)
Computation: CF = (GT)2
N
TSS = Total Sum of Squares minus CF
BSS=Between sum of squares minus CF
WSS=Within sum of squares or TSS minus the BSS.
CF= Correction Factor
GT=Group Total
N=Population
Slide 7: Formula:
CF= (EA+EB+EC)2
NA+NB+NC
TSS= EA2+EB2+EC2-CF
BSS= (EA)2+(EB)2+(EC)2 –CF
NA NB NC
WSS= TSS-BSS
Slide 8: Example: A B C
1 3 2
2 5 7
4 8 9
3 6 3
10 22 21
CF= (10+22+21)2 =(23)2 =2809 = 234.08
4+4+4 12 12
TSS= 30+134+143-234.08=307-234.08=72.92
BSS= (10)2+(22)2+(21)2 - 234.08=
4 4 4
Slide 9: BSS= 100+484+441 – 234.08 = 1025 -234.08=
4 4
256.25-234.08 = 22.17
WSS= 72.92 – 22.17= 50.75
Remember:
If the computed value is greater than the tabular value, reject the Ho.
Slide 11: One-way ANOVA example
Consider an experiment to study the effect of three different levels of some factor on a response (e.g. three types of fertilizer on plant growth). If we had 6 observations for each level, we could write the outcome of the experiment in a table like this, where a1, a2, and a3 are the three levels of the factor being studied.
a1 a2 a3
6 8 13
8 12 9
4 9 11
5 11 8
3 6 7
4 8 12
Slide 12: The null hypothesis, denoted H0, for the overall F-test for this experiment would be that all three levels of the factor produce the same response, on average. To calculate the F-ratio:
Step 1: Calculate the mean within each group:
Slide 13: Step 2: Calculate the overall mean:
where a is the number of groups. Step 3: Calculate the "between-group" sum of squares:
where n is the number of data values per group.
Slide 14: The between-group degrees of freedom is one less than the number of groups
dfb = 3 − 1 = 2 so the between-group mean square value is
MSB = 84 / 2 = 42
Step 4: Calculate the "within-group" sum of squares. Begin by centering the data in each group
A1 A2 A3
6 − 5 = 1 8 − 9 = -1 13 − 10 = 3
8 − 5 = 3 12 − 9 = 3 9 − 10 = -1
4 − 5 = -1 9 − 9 = 0 11 − 10 = 1
5 − 5 = 0 11 − 9 = 2 8 − 10 = -2
3 − 5 = -2 6 − 9 = -3 7 − 10 = -3
4 − 5 = -1 8 − 9 = -1 12 − 10 = 2
Slide 15: The within-group sum of squares is the sum of squares of all 18 values in this table
SSW = 1 + 9 + 1 + 0 + 4 + 1 + 1 + 9 + 0 + 4 + 9 + 1 + 9 + 1 + 1 + 4 + 9 + 4 = 68 The within-group degrees of freedom is
dfW = a(n − 1) = 3(6 − 1) = 15
Thus the within-group mean square value is
MSW = SSW / dfW = 68 / 15 = 4.5 Step 5: The F-ratio is
Slide 17: The critical value is the number that the test statistic must exceed to reject the test. In this case, Fcrit(2,15) = 3.68 at α = 0.05. Since F = 9.3 > 3.68, the results are significant at the 5% significance level. One would reject the null hypothesis, concluding that there is strong evidence that the expected values in the three groups differ. The p-value for this test is 0.002.
Slide 18: After performing the F-test, it is common to carry out some "post-hoc" analysis of the group means. In this case, the first two group means differ by 4 units, the first and third group means differ by 5 units, and the second and third group means differ by only 1 unit. The standard error of each of these differences is .
Slide 19: Thus the first group is strongly different from the other groups, as the mean difference is more times the standard error, so we can be highly confident that the population mean of the first group differs from the population means of the other groups. However there is no evidence that the second and third groups have different population means from each other, as their mean difference of one unit is comparable to the standard error.
Slide 20: Note F(x, y) denotes an F-distribution with x degrees of freedom in the numerator and y degrees of freedom in the denominator.
Slide 21: Thank you..