Ch 4 - Variance

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Chapter 4 : 

Chapter 4 Variability

Variability : 

Variability Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together In statistics, our goal is to measure the amount of variability for a particular set of scores or distribution Look at 4.1, p.106

In general, a good measure of variability serves two purposes: : 

In general, a good measure of variability serves two purposes: 1) It describes the distribution, specifically telling whether the scores are clustered together or spread out over a large distance. Variability is usually defined in terms of distance, telling how much distance to expect between one score and another or how much difference between an individual score and the mean.

In general, a good measure of variability serves two purposes: : 

In general, a good measure of variability serves two purposes: 2) It measures how well an individual score or group of scores represents the entire distribution. This is very important for inferential statistics where relatively small samples are meant to stand in for much larger populations.

Tonight: : 

Tonight: In this chapter we will consider three different measures of variability: the range, the interquartile range and the standard deviation. The standard deviation is by far the most important

The Range : 

The Range The range is the distance from the largest score to the smallest score in a distribution. Typically, the range is defined as the difference between the upper real limit of the largest X value and the lower real limit of the smallest X value.

The Range : 

The Range Because it is defined in terms of distance, it is typically used with interval or ratio scale measurements of a continuous variable. You can, however, generalize the range to other measures provided they are also measured on interval or ratio scales.

The Range : 

The Range Problem with the range is that it is completely defined by two extreme variables and because it does not consider all the scores it does not give an accurate description of variability for the entire distribution. Because of this, it is considered to be a crude and unreliable measure of variability. (Think of our income example from last week)

The Interquartile Range : 

The Interquartile Range Measures the range covered by the middle 50% of the distribution, thus negating the effects of outliers This gave me fits, so we will have time to work on this after class if anyone needs it

How to Find the Interquartile Range : 

How to Find the Interquartile Range First locate the boundary that separates the lowest 25% of the distribution from the rest, this is called the first quartile and is identified as Q1

How to Find the Interquartile Range cont. : 

How to Find the Interquartile Range cont. Next, locate the boundary that separates the top 25% from the rest, called the third quartile and labeled Q3 (because it separates the bottom three quarters from the top quarter). So, the interquartile range is the range covered by the middle 50% of the distribution - Q3-Q1

How to Find the Interquartile Range cont. : 

How to Find the Interquartile Range cont. This gave me a headache, so let’s look at an example…

How to Find the Interquartile Range : 

How to Find the Interquartile Range

How to Find the Interquartile Range : 

How to Find the Interquartile Range

The Standard Deviation : 

The Standard Deviation The standard deviation is the most important and most commonly used measure of variability Uses the mean of the distribution as a reference point and measures variability by considering the distance between each score and the mean, which determines whether the scores are generally near or far from the mean.

The Standard Deviation cont. : 

The Standard Deviation cont. Remember that our goal is to measure the standard, or typical, distance from the mean – keeping this in mind should make the following work easier

Finding the Standard Deviation : 

Finding the Standard Deviation 1 – Determine the deviation or distance from the mean for each individual score By definition this is difference between the score and the mean So, the score minus the mean = standard deviation There needs to be a + or – sign with each score to tell which direction the from the mean the score is located, while the number gives the actual distance from the mean

Finding the Standard Deviation : 

Finding the Standard Deviation 2 – Now we need to compute the mean of the deviation scores Add up the deviation scores and divide by N The positive and negative values will always cancel each other out and give you an answer of zero (the book says always) because the mean is the perfect balance point for everything and this is no help in measuring variability – always perform this part anyway though, as a way to check your numbers – if it’s not zero, you are wrong

Finding the Standard Deviation : 

Finding the Standard Deviation 3 – Therefore, you need to get rid of the signs Square each deviation score Using these you can compute the mean squared deviation, which is called variance Population variance – equals the mean squared deviation. Variance is the average squared distance from the mean This process does not just get rid of the signs, it results in a measure of variability based on squared distances. This isn’t really helpful for any sort of descriptive statistics, we don’t intuitively understand anything because of this particular number.

Finding the Standard Deviation : 

Finding the Standard Deviation 4 – This is where we correct for all the squares we used to get rid of the signs and get the actual standard deviation, which is the square root of the variance. Good summary right below the equation for standard deviation on page 111

Slide 21: 

Because the standard deviation and variance are defined in terms of distance from the mean, these measures of variability are used only with numerical scores that are obtained from measurements on an interval or ratio scale

Slide 22: 

Remember that these two scales are the only ones that provide information about distance, nominal and ordinal scales do not Also remember that it is inappropriate to compute a mean for ordinal data and it is impossible to compute a mean for nominal data Specifically, the mean, the standard deviation and the variance should be used only with numerical scores from interval or ordinal scales of measurement

Slide 23: 

Figure 4.3 is good, use it

#7, p. 134 : 

#7, p. 134

#7, p. 134 cont. : 

#7, p. 134 cont.

Formulas : 

Formulas

Graphic Representation : 

Graphic Representation See figure 4.5

Standard deviation and variance for samples : 

Standard deviation and variance for samples The goal of inferential statistics is to use the limited information from samples to draw general conclusions about populations, with the basic assumption being that samples should be representative of the populations from which they are drawn This is problematic because samples tend to be less variable than their populations

Standard deviation and variance for samples : 

Standard deviation and variance for samples Extreme values are unlikely to be obtained when collecting a sample, thus changing things and giving a biased estimate of population variability in the direction of underestimating the population value rather than being right on the mark However, this bias is at least consistent and predictable, which means it can be corrected The speedometer in your car

Standard Deviation for a sample : 

Standard Deviation for a sample The first three steps are the same, except for the notation μ becomes M N becomes n

Standard Deviation for a sample : 

Standard Deviation for a sample Find the deviation for each score X – M Square each deviation (X –M) squared Add the squared deviations SS=∑(X – M) squared

Standard Deviation for a sample cont. : 

Standard Deviation for a sample cont. This is where it is very important to differentiate between populations and samples

Standard Deviation for a sample cont. : 

Standard Deviation for a sample cont.

Standard Deviation for a sample cont. : 

Standard Deviation for a sample cont. Basically, all that was to tell you that n-1 corrects the bias in sample variability It is important though, so keep it in mind!

Degrees of Freedom : 

Degrees of Freedom Although the concept of deviation score and the calculation SS are almost exactly the same for samples and populations, the minor differences in notation are key – specifically, with a population you find the deviation for each score by measuring its distance from the population mean while for a sample the value of the population mean is unknown and you must measure distances from the sample mean. This requires that you know M before you can begin to compute deviations, but this is also a restriction on the degree of variability in the scores.

Degrees of Freedom : 

Degrees of Freedom Degrees of freedom: For a sample of n scores, the degrees of freedom or df for the sample variance are defined as df=n-1. The degrees of freedom determine the number of scores in the sample that are independent and free to vary.

Degrees of Freedom cont. : 

Degrees of Freedom cont. The n-1 df for a sample is the same n-1 that is used in the formulas for sample variance and standard deviation – remember that variance is defined as the mean squared deviation For now, remember that knowing the sample mean places a restriction on sample variability – only n-1 of the scores are free to vary, df=n-1

More about variance and standard deviation : 

More about variance and standard deviation A sample statistic is biased if the average value of the statistic, for any sample size, either under or over estimates the corresponding population parameter. If the average value of the statistic is equal to the population parameter, the statistic is said to be unbiased. This does not mean everything will come out perfectly 4.7 is helpful to explain the importance of n-1

Standard deviation and descriptive statistics : 

Standard deviation and descriptive statistics The standard deviation is primarily a descriptive measure, it describes how variable or spread out the scores in a distribution are It does this by measuring the distance from the mean, this is very useful in looking at how much a distribution varied:

Standard deviation and descriptive statistics cont. : 

Standard deviation and descriptive statistics cont. Consider something like attitudes toward something or beliefs about something It shows how far the typical or average person is from the mean, from being perfectly balanced It also gives us a way to tell how extreme a person’s score is, how far from the average they are in comparison to everyone else

Standard deviation and descriptive statistics cont. : 

Standard deviation and descriptive statistics cont. The mean and the standard deviation are the most common values used to describe a set of data Just remember that these are not simply abstract concepts or mathematical equations but hold concrete and meaningful significance.

Transformations of scale : 

Transformations of scale Sometimes a set of scores is changed by adding a constant to each score or by multiplying each score by a constant value This would happen when you want to change the unit of measurement for instance

Transformations of Scale cont. : 

Transformations of Scale cont. What does this do to standard deviation? Adding a constant to each score does not change the standard deviation This does not affect distance because everything is moving in the same way Multiplying each score by a constant causes the standard deviation to be multiplied by the same constant

Variance and inferential statistics : 

Variance and inferential statistics Variance and inferential statistics In general terms the goal of inferential statistics is to detect meaningful and significant patterns in research results. The basic question is whether the sample data reflects patterns that exist in the population or the sample data shows simply show random fluctuations that occur by chance.

Variance and inferential statistics : 

Variance and inferential statistics Variability plays an important role here because the variability influences how easy it is to see patterns – in general, low variability means that existing patterns can be seen easily, whereas high variability makes it harder to catch them.

Variance and inferential statistics : 

Variance and inferential statistics

Variance and inferential statistics : 

Variance and inferential statistics In the context of inferential statistics, the variance that exists in a set of sample data is often classified as error variance, a term used to indicate that the sample variance represents unexplained and uncontrolled differences between scores As this error variance increases it becomes more and more difficult to see patterns that might exist

In short….. : 

In short….. Variance makes it difficult to get a clear signal form the data

Comparing measures of variability : 

Comparing measures of variability Standard deviation is the most used by far, though there are situations in which the range or inter-quartile range might be preferable

Comparing measures of variability : 

Comparing measures of variability In simple terms, two considerations determine the value of any statistical measurement: The measure should provide a stable and reliable description of the scores. Specifically, it should not be greatly affected by minor details in the set of data The measure should have a consistent and predictable relationship with other statistical measurements

Factors that Affect Variability : 

Factors that Affect Variability Factors that affect variability: Extreme scores: The range is most affected by this A single extreme value will have a large influence on the range Standard deviation and variance are also affected by this and a single extreme score can have a disproportionate effect, so standard deviation and variance must be interpreted carefully in a set with extreme scores The semi-interquartile range is the least affected and therefore often provides the best measure of variability for distributions that are skewed or have a few extreme scores

Factors that Affect Variability : 

Factors that Affect Variability Sample size: As you increase the number of scores in a sample, you also tend to increase the range because each new score has the potential to replace the current highest or lowest value in the set, so the range is directly related to the sample size This relationship between sample size and variability is unacceptable as the researcher should not be able to influence variability by manipulating sample size Standard deviation, variance and the inter-quartile range are relatively unaffected by sample size and therefore provide better measures

Factors that Affect Variability : 

Factors that Affect Variability Stability under sampling: If you take several different samples from the same population, you should expect the samples to be similar Specifically, if you compute variability for each of the separate samples, you should expect to obtain similar values When standard deviation and variance are used to measure variability, the samples tend to have similar variability, so standard deviation and variance are said to be stable under sampling The semi-interquartile range also provides a reasonably stable measure of variability The range, however, changes unpredictably from sample to sample and is said to be unstable under sampling

Factors that Affect Variability : 

Factors that Affect Variability When a distribution does not have any specific boundary for the highest or lowest score, it is open-ended. This can occur when you have infinite or undetermined scores In this situation, you cannot compute the range, the standard deviation, or the variance, so the only measure of variability is the semi-interquartile range

Slide 55: 

And you made it through 55 slides about variability….don’t hate me