PowerPoint Presentation: 1 Analysis Of Variance By Lecture Series on Statistics No. Bio-Stat_13 Date – 7.12.2008 Dr. Bijaya Bhusan Nanda, M. Sc (Gold Medalist) Ph. D. (Stat.) Topper Orissa Statistics & Economics Services, 1988 bijayabnanda@yahoo.com
CONTENTS: CONTENTS Introduction Completely Randomized Design Randomized Complete Block design
PowerPoint Presentation: Introduction ANOVA is the technique where the total variance present in the data set is spilt up into non- negative components where each component is due to one factor or cause of variation. Factors of variation Assignable Non-assignable Can be many Error or Random variation
PowerPoint Presentation: ANOVA is used to test hypotheses about differences between two or more means. The t-test can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using t-tests. However, conducting multiple t-tests can lead to severe inflation of the Type I error rate. ANOVA can be used to test differences among several means for significance without increasing the Type I error rate. Utility
PowerPoint Presentation: The ANOVA Procedure: This is the ten step procedure for analysis of variance: 1.Description of data 2.Assumption: Along with the assumptions, we represent the model for each design we discuss. 3. Hypothesis 4.Test statistic 5.Distribution of test statistic 6.Decision rule 7.Calculation of test statistic: The results of the arithmetic calculations will be summarized in a table called the analysis of variance (ANOVA) table. The entries in the table make it easy to evaluate the results of the analysis. 8.Statistical decision 9.Conclusion 10.Determination of p value
PowerPoint Presentation: ONE-WAY ANOVA- Completely Randomized Design (CRD) One-way ANOVA: It is the simplest type of ANOVA, in which only one source of variation, or factor, is investigated. It is an extension to three or more samples of the t test procedure for use with two independent samples In another way t test for use with two independent samples is a special case of one-way analysis of variance.
PowerPoint Presentation: Experimental design used for one-way ANOVA is called Completely randomised design. This test the effect of equality of several treatments of one assignable cause of variation. Based on two principles- Replication and randomization. Advantages: Very simple: Reduce the experimental error to a great extent. We can reduce or increase some treatments. Suitable for laboratory experiment. Disadvantages: Design is not suitable if the experimental units are not homogeneous. Design is not so much efficient and sensitive as compared to others. Local control is completely neglected. Not suitable for field experiment.
PowerPoint Presentation: Hypothesis Testing Steps: 1. Description of data: The measurements( or observation) resulting from a completely randomized experimental design, along with the means and totals. Available Subjects Random numbers 02 01 03 05 04 06 08 07 09 10 11 12 13 15 14 16 09 16 06 14 15 11 04 02 10 07 05 13 03 01 12 08 16 09 06 15 14 11 02 04 10 07 05 13 03 12 01 08
PowerPoint Presentation: Table of Sample Values for the CRD Treatment 1 2 3 … K x 11 x 12 x 13 … x 1k x 21 x 22 x 23 …. X2k . . . . x n 1 1 x n 2 2 x n 3 3 x nk k Total T .1 T .2 T .3 T .k T .. Mean x .1 x .2 x .3 x .k x ..
PowerPoint Presentation: T .j = x ij = total of the j th treatment x .j = T .j n j = mean of the jth treatment T .. = T .j = x ij = total of all observations x.. = T.. N , N = n j x ij = the i th observation resulting from the j th treatment (there are a total of k treatment)
PowerPoint Presentation: 2. Assumption: The Model The one-way analysis of variance may be written as follows: x ij = j e ij ; i =1,2… n j , j= 1,2….k The terms in this model are defined as follows: 1. represents the mean of all the k population means and is called the grand mean. 2. j represents the difference between the mean of the j th population and the grand mean and is called the treatment effect. 3. e ij represents the amount by which an individual measurement differs from the mean of the population to which it belongs and is called the error term.
Assumptions of the Model: Assumptions of the Model The k sets of observed data constitute k independent random samples from the respective populations. Each of the populations from which the samples come is normally distributed with mean j and variance j 2 . Each of the populations has the same variance. That is 1 2 = 2 2 …= k 2 = 2 , the common variance. The j are unknown constants and j = 0, since the sum of all deviations of the j from their mean, , is zero. The e ij have a mean of 0, since the mean of x ij is j The e ij have a variance equal to the variance of the x ij , since the e ij and x ij differ only by a constant. The e ij are normally (and independently) distributed.
PowerPoint Presentation: 3. Hypothesis: We test the null hypothesis that all population or treatment means are equal against the alternative that the members of at least one pair are not equal. We may state the hypothesis as follows H 0 : µ 1 = µ 2 =…..= µ k H A : not all µ j are equal If the population means are equal, each treatment effect is equal to zero, so that alternatively, the hypothesis may be stated as H 0 : τ j = 0, j=1,2,…….,k H A : not all τ j =0
PowerPoint Presentation: 4. Test statistic: Table: Analysis of Variance Table for the Completely Randomized Design The Total Sum of squares(SST): It is the sum of the squares of the deviations of individual observations taken together. Source of variation Sum of square d.f Mean square Variance ratio Among sample k-1 MSA=SSA/(k-1) MS due to Treatment V.R=MSA/MSW Within samples N-k MSW=SSW/(N-k) MS due to error Total N-1
PowerPoint Presentation: The Within Groups of Sum of Squares: The first step in the computation call for performing some calculations within each group. These calculation involve computing within each group the sum of squared deviations of the individual observations from their mean. When these calculations have been performed within each group, we obtain the sum of the individual group results. The Among Groups Sum of Squares: To obtain the second component of the total sum of square, we compute for each group the squared deviation of the group mean from the grand mean and multiply the result by the size of the group. Finally we add these results over all groups. Total sum of square is equal to the sum of the among and the within sum of square. SST=SSA+SSW
PowerPoint Presentation: The First Estimate of σ 2 : Within any sample Provides an unbiased estimate of the true variance of the population from which the sample came. Under the assumption that the population variances are all equal, we may pool the k estimate to obtain
PowerPoint Presentation: The Second Estimate of σ 2 : The second estimate of σ 2 may be obtain from the familiar formula for the variance of sample means, . If we solve this equation for σ 2 , the variance of the population from which the samples were drawn, we have An unbiased estimate of , computed from sample data, is provided by If we substitute this quantity into equation we obtain the desired estimate of σ 2
PowerPoint Presentation: When the sample sizes are not all equal, an estimate of σ 2 based on the variability among sample means is provided by The Variance Ratio: What we need to do now is to compare these two estimates of σ 2, and we do this by computing the following variance ratio, which is the desired test statistic: V.R = Among groups mean square Within groups mean square
PowerPoint Presentation: 6. Distribution of Test statistic: F distribution we use in a given situation depends on the number of degrees of freedom associated with the sample variance in the numerator and the number of degrees of freedom associated with the sample variance in the denominator. we compute V.R. in situations of this type by placing the among groups mean square in the numerator and the within groups mean square in the denominator , so that the numerator degrees of freedom is equal to the number of groups minus 1, (k-1), and the denominator degrees of freedom value is equal to
PowerPoint Presentation: 7. Significance Level: Once the appropriate F distribution has been determined, the size of the observed V.R. that will cause rejection of the hypothesis of equal population variances depends on the significance level chosen. The significance level chosen determines the critical value of F, the value that separates the nonrejection region from the rejection region. 8. Statistical decision: To reach a decision we must compare our computed V.R. with the critical value of F, which we obtain by entering Table G with k-1 numerator degrees of freedom and N-k denominator degrees of freedom . If the computed V.R. is equal to or greater than the critical value of F, we reject the null hypothesis. If the computed value of V.R. is smaller than the critical value of F, we do not reject the null hypothesis.
PowerPoint Presentation: 9. Conclusion: When we reject H 0 we conclude that not all population means are equal. When we fail to reject H 0 , we conclude that the population means may be equal. 10. Determination of p value
PowerPoint Presentation: Example:1 The aim of a study by Makynen et al.(A-1) was to investigate whether increased dietary calcium as a nonpharmacological treatment of elevated blood pressure could beneficially influence endothelial function in experimental mineralocorticoid-NaCl hypertension. The researchers divided seven weak-old male Wistar –Kyoto rats (WKY) into four groups with equal mean systolic blood pressure: untreated rats on normal(WKY) and high-calcium(WKY-Ca) diets, and deoxycorticosterone-NaCl-treated rats on normal(DOC) and high-calcium diets(DOC-Ca). We wish to know if the four conditions have different effects on the mean weights of male rats.
PowerPoint Presentation: Condition Doc WKY DOC-Ca WKY-Ca 336 346 269 346 323 309 322 316 300 309 276 306 310 302 269 311 328 315 343 368 353 374 356 339 343 343 334 333 313 333 372 304 292 299 293 277 303 303 320 324 340 299 279 305 290 300 312 342 284 334 348 315 313 301 354 346 319 289 322 308 325 Total 4950 5147 4840 4500 19437 Mean 309.38 343.13 302.50 321.43 318.64
PowerPoint Presentation: Assumption: We assume that the four sets of data constitute independent simple random samples from four populations that are similar expect for the condition studied. We assume that the four populations of measurements are normally distributed with equal variances. Hypothesis: H 0 : 1 = 2 = 3 = 4 (On the average the four conditions elicit the same response) H A : Not all ’s are equal Test statistic: The test statistic is V.R =MSA/MSW. Source SS d.f MS V.R Among samples 14649.1514 3 4883.0503 11.99 Within samples 23210.9023 57 407.2088 Total 37860.0547 60
PowerPoint Presentation: Distribution of test statistic: If H 0 is true and the assumptions are met, V.R follows the F distribution with 4-1=3 numerator degrees of freedom and 61-4=57 denominator degrees of freedom. Decision rule: Suppose let =0.05. The critical value of F from Table G is 3.34. The decision rule, then, is reject H 0 if the computed V.R is equal to or greater than 3.34. Calculation of test statistic: SST=37860.0547 SSA=14649.1514 SSW=37860.0547-14649.1514=23210.9023 Statistical decision : Since our computed V.R of 11.99 is greater than the critical F of 3.34, we reject H0. Conclusion: Since we reject H0,the four treatments do not all have the same average effect. p value: Since 11.99>4.77 , p<0.005 for this test.
PowerPoint Presentation: Testing for Significant Differences Between Individual Pairs of Means: Turkey’s HSD Test: Turkeys Test for unequal sample sizes (Spjotvoll and Stolins) = smallest of the two sample sizes that are compared. Absolute value of the difference between the two corresponding sample means if it exceeds HSD* is declared to be significant
PowerPoint Presentation: Example:2 Let us illustrate the use of the HSD test with the data from the Example-1. Solution: The first step is to prepare a table of all possible (ordered) differences between means. This is displayed in the following table: Suppose we let α =0.05. Entering table H with α =0.05, k=4, and N-k=57, we find that q= 3.75. MSE=407.2088. The hypothesis that can be tested, the value of HSD*, and the statistical decision for each test are shown in the following Table. DOC-Ca DOC WKY-Ca WKY DOC-Ca (DC) DOC (D) WKY-Ca (WC) WKY(W) - 6.87 - 18.93 12.06 - 40.63 33.76 21.70 -
PowerPoint Presentation: Hypothesis HSD* Statistical Decision H 0 :µ DC =µ D Do not reject H 0 since 6.87<18.92 H 0 :µ DC =µ WC Do not reject H 0 since 18.93<20.22 H 0 :µ DC =µ W reject H 0 since 40.63>19.54 H 0 :µ D =µ WC Do not reject H 0 since 12.06<20.22 H 0 :µ D =µ W reject H 0 since 33.76>19.54 H 0 :µ WC =µ W Do not reject H 0 since 21.7<20.22 Table: Multiple Comparison Tests Using data of Example 1 and HSD*
PowerPoint Presentation: In RBD three principle of design is used i.e. replication, randomization and local control and randomization is restrict to only one direction. Advantages Simplest method to test the treatment effects as well as block effects. Statistical analysis also simple because it is based on two-way classification. More efficient than CRD. Trend effect is reduced. Suitable for field experiment as well as lab. Experiment. Randomized Block design(RBD)
PowerPoint Presentation: Disadvantages If the treatments are more then the design is not suitable. Table of Sample Values for the Randomized Complete Block Design Treatments Blocks 1 2 3 …………… k Total Mean 1 x 11 x 12 x 13 x 1k T 1. x 1. 2 x 21 x 22 x 23 ………… x 2k T 2. x 2. . . n x n1 x n2 x n3 ………. x nk Tn. X n. Total T .1 T .2 T .3 ………. T .k T.. Mean x .1 x .2 x .3 …….. x .k x..
PowerPoint Presentation: The Model for two-way ANOVA is xij = + i + j + e ij , i= 1,2,…..,n; j=1,2,……,k In this model x ij is a typical value from the overall population. is an unknown constant. i represents a block effect reflecting the fact that the experimental unit fell in the ith block. j represents a treatment effect, reflecting the fact that the experimental unit received the jth treatment e ij is a residual component representing all sources of variation other than treatments and blocks.
PowerPoint Presentation: Assumptions of the Model: (a) Each xij that is observed constitute a random independent sample of size 1 from one of the kn populations represented. (b) each of these kn populations is normally distributed with mean ij and variance 2.eij are independently and normally distributed with mean 0 and variance 2. (c) The block and treatment effects are additive . Example:3 A physical therapist wished to compute three methods for teaching patients to use a certain prosthetic device. He felt that the rate of learning would be different for patients of different ages and wished to design an experiment in which the influence of age could be taken in to account.
PowerPoint Presentation: Solution: Assumption: We assume that each of the 15 observations constitutes a simple random of size 1 from one of the 15 populations defined by a block-treatment combination. Hypothesis: H0: j =0, j=1,2,3 HA: not all j = 0 Let = 0.05 Test statistic: The test statistic is MSTr /MSE Distribution of test statistic: When H0 is true and the assumptions are met, V.R follows an F distribution with 2 and 8 degrees of freedom
PowerPoint Presentation: Table: Time(in days)required to learn the use of a certain Prosthetic device Decision rule: reject the null hypothesis if the computed V.R is equal to or greater than the critical F, which we find in table G to be 4.46. Teaching methods Age group A B C Total Mean Under 20 20 to 29 30 to 39 40 to 49 50 and above 7 8 9 10 11 9 9 9 9 12 10 10 12 12 14 26 27 30 31 37 8.67 9.00 10.00 10.33 12.33 Total 45 48 58 151 Mean 9.0 9.6 11.6 10.7
PowerPoint Presentation: Calculation of test statistic: SST = (7-10.7) 2 +(8-10.07) 2 +….+(14-10.07) 2 =46.9335 SSBl = 3[(8.67-10.07)2+(9.00-10.07)2+…+(12.33-10.07)2] = 24.855 SSTr = 5[(9-10.07)2+(9.6-10.07)2+11.6-10.07)2] = 18.5335 SSE = 46.9335 – 24.855 – 18.5335 = 3.545 The degrees of freedom are total = (3)(5)-1=14, blocks=5-1=4, treatments = 3-1 = 2, and residual = (5-1)(3-1) =8. The results of the calculations may be displayed in an ANOVA Table. Source Ss d.f MS V.R Treatments 18.5335 2 9.26675 20.91 Blocks 24.855 4 6.21375 Residual 3.545 8 0.443125 Total 46.9335 14
PowerPoint Presentation: Statistical decision: Since our computed variance ratio, 20.91, is greater than 4.46, we reject the null hypothesis of no treatment effects on the assumption that such a large V.R reflects the fact that the two sample mean square are not estimating the same quantity. The only other explanation for this large V.R would be that the null hypothesis is really true, and we have just observed an unusual set of results. Conclusion: We conclude that not all treatment effects are equal to zero, or equivalently, that not all treatment means are equal. p value: For this test p< 0.005
PowerPoint Presentation: THANK YOU