Power analysis 06

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Power analysis and hypothesis testing with multiple samples: 

Power analysis and hypothesis testing with multiple samples ESM 206 6 April 2006

Types of error: 

Types of error Type I: reject null hypothesis when it’s really true Desired level: a Type II: fail to reject null hypothesis when it’s really false Desired level: b Is associated with a given effect size E.g., want a probability 0.1 of failing to reject when true difference between means is 0.35.

Setting error levels: 

Setting error levels a is controlled by setting critical P-value for rejecting null hypothesis b decreased by increasing a Increasing sample size (n) Decreasing sample variance, var(x) increasing effect size, D Tradeoff between a and b Need to balance costs associated with type I and type II errors Power is 1-b POWER ANALYSIS Take a few samples to get an estimate of var(x) Assume that population has mean m0 + D and variance var(x) If I take n samples, what is the probability of failing to reject the null hypothesis (getting P > a)? Either through simulation or theory Adjust n to get the desired error level

Power and the water temperature test: 

Power and the water temperature test = 3 Testing H0: m ≤ 56

Effect of sample size on power: 

Effect of sample size on power

Statistics & Decision Making: EPA’s Data Quality Objectives (DQOs): 

Statistics & Decision Making: EPA’s Data Quality Objectives (DQOs) What are DQOs? DQOs are qualitative and quantitative statements, developed using the DQO Process, that clarify study objectives, define the appropriate type of data, and specify tolerable levels of potential decision errors that will be used as the basis for establishing the quality and quantity of data needed to support decisions. DQOs define the performance criteria that limit the probabilities of making decision errors by considering the purpose of collecting the data; defining the appropriate type of data needed; and specifying tolerable probabilities of making decision errors. See link on class website

The DQO process: 

The DQO process State the Problem Define the problem; identify the planning team; examine budget, schedule. Identify the Decision State decision; identify study question; define alternative actions. Identify the Inputs to the Decision Identify information needed for the decision (information sources, basis for Action Level, sampling/analysis method). Define the Boundaries of the Study Specify sample characteristics; define spatial/temporal limits, units of decision making. Develop a Decision Rule Define statistical parameter (mean, median); specify Action Level; develop logic for action. Specify Tolerable Limits on Decision Errors Set acceptable limits for decision errors relative to consequences (health effects, costs). Optimize the Design for Obtaining Data Select resource-effective sampling and analysis plan that meets the performance criteria.

Preliminary assessment of household lead dust: 

Preliminary assessment of household lead dust State the Problem Describing the problem. The owners wish to evaluate the potential hazards associated with lead in dust in a single-family residence because other residences in the Athington Park House neighborhood had shown levels of lead in dust that might pose potential hazards. Establishing the planning team. The planning team included the property owners, a certified risk assessor (to collect and handle dust samples and serve as a liaison with the laboratory), and a quality assurance specialist. The decision makers were the property owners. Describing the conceptual model of the potential hazard. The conceptual model described a single-family residence in a neighborhood where hazardous levels of lead had been detected in other residences. Interior sources of lead in dust were identified as lead-based paint on doors, walls, and trim, which deteriorated to form, or attach to, dust particles. Exterior sources included lead in exterior painted surfaces that had deteriorated and leached into the dripline soil, or lead deposited from gasoline combustion fumes that accumulated in soil. In these cases, soil could be tracked into the house, and collected as dust on floors, window sills, toys, etc. As this dust could be easily ingested through hand-to-mouth activities, dust was considered to be a significant exposure route. Levels of lead in floor dust were to be used as an indicator of the potential hazard. Identifying the general intended use of collected data. The data collected in this study will be used to determine if a heath hazard is present at Athington Park House using the criteria established under 40 CFR 745. This is a decision making (test of hypothesis) DQO Process. Identifying available resources, constraints, and deadlines. The property owners were willing to commit up to $1,000 for the study. To minimize inconvenience to the family, all sampling would be conducted during one calendar day. Identify the Decision Specifying the primary study question. The primary question to be addressed is to determine if there were significant levels of lead in floor dust at the House. Determining the range of possible outcomes from this study. If there were significant levels of lead in floor dust at the residence, the team planned follow-up testing to determine whether immediately dangerous contamination exists and the location of the contamination in the property. If not, then there was no potential lead hazard, and testing would be discontinued.

Preliminary assessment of household lead dust: 

Preliminary assessment of household lead dust Identify the Inputs to the Decision Identifying the types of information that is needed to resolve the decision statement. The assessment of a dust lead hazard would be evaluated by measuring dust lead loadings by individual dust wipe sampling according to established protocol. Identifying the source of information. The EPA proposed standard stated that if dust lead levels were above 50 μg /ft2 on bare floors, a lead health hazard was possible and follow-up testing and/or intervention should be undertaken (40 CFR 745). Identifying how the Action Level will be determined. The Action Level is the EPA standard specified in 40 CFR 745. Identifying appropriate sampling and analysis methods. Wipe samples were collected according to ASTM standard practice E1728. These samples were digested in accordance with ASTM standard practice E1644 and the sample extracts were chemically analyzed by ASTM standard test method E1613. The results of these analyses provided information on lead loading (i.e., μgof lead per square foot of wipe area) for each dust sample. The detection limit was well below the Action Level. Define the Boundaries of the Study Specifying the spatial and temporal boundaries for collecting data. The spatial boundaries of the study area were defined as all floor areas within the dwelling that were reasonably accessible to young children who lived at, or visited, the property. Dust contained in each one ft.2 area of each floor of the residence was sampled and sent to a laboratory for analysis. Specifying other practical constraints for collecting data. Permission from the residents of Athington Park House was required before risk assessors could enter the residence to collect dust wipe samples. Sampling was completed within 1 calendar day to minimize the inconvenience to the residents. Specifying the scale of estimates to be made. The test results were considered to appropriately characterize the current and future hazards. It was possible that lead contained in soil could be tracked into the residence and collect on surfaces, but no significant airborne sources of lead deposition were known in the region. The dust was not expected to be transported away from the property; therefore, provided the exterior paint was maintained in intact condition, lead concentrations measured in the dust were not expected to change significantly over time. Specifying the scale of inference for decision making. The decision unit was the interior floor surface (approximately 1,700 ft2) of the residence at the time of sampling and in the near future.

Preliminary assessment of household lead dust: 

Preliminary assessment of household lead dust Develop a Decision Rule Specifying the Action Level. This was given in 40 CFR 745 which specified 50 μg/ft2. Developing the population of interest and the theoretical decision rule. From 40 CFR 745, the median was selected as the appropriate parameter to characterize the population under study. The median dust lead loading was defined to be that level, measured in μg/ft2, above and below which 50% of all possible dust lead loadings at the property were expected to fall. If the true median dust loading in the residence was greater than 50 μg/ft2, then the planning team required followup testing. Otherwise, they decided that a dust lead hazard was not present and discontinued testing.

Preliminary assessment of household lead dust: 

Preliminary assessment of household lead dust Specify Tolerable Limits on Decision Errors Setting the baseline condition. The baseline condition adopted by the property owners was that the true median dust lead loading was above the EPA hazard level of 50 μg/ft2, due to the seriousness of the potential hazard. The planning team decided that the most serious decision error would be to decide that the true median dust lead loading was below the EPA hazard level of 50 μg/ft2, when in truth the median dust lead loading was above the hazard level. This incorrect decision would result in significant exposure to dust lead and adverse health effects. Determining the impact of decision errors and setting tolerable decision error limits. The edge of the gray region was designated by considering that a false acceptance decision error would result in the unnecessary expenditure of scarce resources for follow-up testing and/or intervention associated with a presumed hazard that did not exist. The planning team decided that this decision error should be adequately controlled for true dust lead loadings of 40 μg/ft2 and below. Since human exposure to lead dust hazards causes serious health effects, the planning team decided to limit the false rejection error rate to 5%. This meant that if this dwelling’s true median dust lead loading was greater than 50 μg/ft2, the baseline condition would be correctly rejected 19 out of 20 times. The false acceptance decision, which would result in unnecessary use of testing and intervention resources, was allowed to occur more frequently (i.e., 20% of the time when the true dust-lead loading is 40 μg/ft2 or less).

EXAMPLE: LEAD DUST: 

EXAMPLE: LEAD DUST Preliminary sampling suggests that the standard deviation of lead dust observations is 30 mg/ft2 We want to know how many observations we need to take so that b is 0.2 if the true mean dust concentration is 10 mg/ft2 below the contamination threshold and we are using an a of 0.05 For t-test, a for 1-sided test equivalent to 2a for 2-sided test 1 - b

Interlude: the lognormal distribution: 

Interlude: the lognormal distribution

LEAD DUST DONE RIGHT: 

LEAD DUST DONE RIGHT Effect size is log(50) – log(40) = 0.22 Std. dev. of log(lead) est. as 1.5

Cleanup of a contaminated site: 

Cleanup of a contaminated site THE PROBLEM A site has suffered the release of a toxic chemical (TcCB) into the soil, and the company responsible has undertaken cleanup activities. How should we decide whether the cleanup has been adequate? THE DATA We have samples of TcCB concentration (measured in ppb) in the soils at the cleanup site, as well as samples of concentrations at an uncontaminated “reference” site with similar soil characteristics. The concentrations of TcCB at the reference site are not zero, and we need to determine what the normal levels of this chemical are.

EPA standards for assessing site contamination*: 

EPA standards for assessing site contamination* If a site has not been declared to be contaminated, then the null hypothesis should be that it is clean, i.e., there is no difference from the control site. The alternative hypothesis is that the site is contaminated. A non-significant test results leads to the conclusion that there is no real evidence that the site is contaminated. If a site has been declared to be contaminated, then the null hypothesis should be that this is true, i.e., there is a difference (in an unacceptable direction) from the control site. The alternative hypothesis is that the site is clean. A non-significant test results leads to the conclusion that there is no real evidence that the site has been cleaned up. * USEPA (1989) Methods for Evaluating the Attainment of Cleanup Standards. Vol. 1: Soils and Solid Media. EPA Report 230/02-89-042, Office of Policy, Planning and Evaluation, Washington, DC.

COMPARING TWO GROUPS: 

COMPARING TWO GROUPS Two-sample t-test Tests for differences between means of two groups Null hypotheses: Under null hypothesis, difference in means, standardized by standard deviations of both groups, should follow a t distribution

TcCB cleanup: conclusion: 

TcCB cleanup: conclusion Using the null hypothesis that the cleanup site is contaminated with respect to the control site, we fail to reject the hypothesis that the cleanup site is still contaminated (one-sided two-sample t-test with unequal variances, t = 1.45, df = 76.05, P = 0.925).

Comparing fuel efficiency of two gasoline blends: 

Comparing fuel efficiency of two gasoline blends THE PROBLEM The owner of a taxi company is evaluating two gasoline blends, and wants to use the one that produces greater fuel efficiency How should she decide which (if either) produces greater efficiency? THE DATA On one day, all the taxis in the fleet were fueled with gas A, and at the end of the day the efficiency of each car (in mpg) was calculated. On the next day, all the taxis in the fleet were fueled with gas B, and at the end of the day the efficiency of each car was calculated.

Two-sample t-test of gas data: 

Two-sample t-test of gas data Want to control for variability among drivers

COMPARING TWO GROUPS: 

COMPARING TWO GROUPS Paired t-test Each observation is a pair of measurements Water quality upstream and downstream of a road crossing Fuel mileage by a taxi driver using two brands of gasoline Natural variability between sampling units might swamp differences between the means Streams have different background water quality Drivers have different driving styles Instead, test for mean of differences

Slide23: 

CONCLUSION We find strong evidence that mileage differs between gas A and gas B (paired t-test, t = 3.12, df = 9, P = 0.012). On average, the fuel efficiency with gas B is 0.6 mpg greater than with gas A.

COMPARING MEANS OF 3 OR MORE GROUPS: 

COMPARING MEANS OF 3 OR MORE GROUPS ANOVA (ANalysis Of VAriance) Like 2-sample t-test, but with multiple groups H0: All groups have the same mean HA: Not all groups have the same mean Rejecting H0 doesn’t tell you which groups differ Can do a bunch of t-tests for this

Slide25: 

CONCLUSION: Very strong evidence that highway mileage differs among car types (one-way ANOVA, F = 23.67, df = 5,86, P < 0.0001)

HYPOTHESIS TESTING: OVERVIEW: 

HYPOTHESIS TESTING: OVERVIEW

ASSUMPTIONS OF T-TEST AND ANOVA: 

ASSUMPTIONS OF T-TEST AND ANOVA T-test: Distribution within each group is normal ANOVA Distribution within each group is normal Variances of all groups are the same Both tests are robust to moderate violations of these assumptions Regard P value as an approximate value TcCB data: Assumption of normality is badly violated Solution: do tests on transformed data Car mileage data: Assumption of equal variances is badly violated Solution: perform Welch ANOVA