Slide1: SLIDES PREPARED
By
Lloyd R. Jaisingh Ph.D.
Morehead State University
Morehead KY
Chapter 4: Chapter 4 Data Description – Numerical Measures of Position for Ungrouped Univariate Data
Outline: Outline Do I Need to Read This Chapter?
4-1 The z-Score or Standard Score
4-2 Percentiles
It’s a Wrap
Objectives: Objectives Introduction of some basic statistical measurements of position.
Introduction of some graphical displays to explain these measures of position.
Introduction: Introduction A measure of location or position for a collection of data values is a number that is meant to convey the idea of the relative position of a data value in the data set.
The most commonly used measures of location for sample data are the: z-score, and percentiles.
4-1 The z-Score: 4-1 The z-Score Explanation of the term – z-score: The z-score for a sample value in a data set is obtained by subtracting the mean of the data set from the value and dividing the result by the standard deviation of the data set.
NOTE: When computing the value of the z-score, the data values can be population values or sample values.
Hence we can compute either a population z-score or a sample z-score.
4-1 The z-Score: 4-1 The z-Score The Sample z-score for a value x is given by the following formula:
Where is the sample mean and s is the sample standard deviation.
4-1 The z-Score: 4-1 The z-Score The Population z-score for a value x is given by the following formula:
Where is the population mean and is the population standard deviation.
Quick Tip: : Quick Tip: The z-score is the number of standard deviations the data value falls above (positive z-score) or below (negative z-score) the mean for the data set.
Quick Tip: : Quick Tip: The z-score is affected by an outlying value in the data set, since the outlier (very small or very large value relative to the size of the other values in the data set) directly affects the value of the mean and the standard deviation.
The z-Score -- Example : The z-Score -- Example Example: What is the z-score for the value of 14 in the following sample values?
3 8 6 14 4 12 7 10
The z-score -- Example (Continued): The z-score -- Example (Continued) Solution:
Thus, the data value of 14 is 1.57 standard deviations above the mean of 8, since the z-score is positive.
The z-Score – Why do we use the z-score as a measure of relative position?: The z-Score – Why do we use the z-score as a measure of relative position? Dot Plot of the data points with the location of the mean and the data value of 14.
The z-score: The z-score Observe that the distance between the mean of 8 and the value of 14 is 1.57s = 5.99 6.
Observe that if we add the mean of 8 to this value of 6, we will get 8 + 6 = 14, the data value.
Thus, this shows that the value of 14 is 1.57 standard deviations above the mean value of 8.
The z-score: The z-score That is, the z-score gives us an idea of how far away the data value is from the mean, and so it gives us an idea of the position of the data value relative to the mean.
The z-Score -- Example : The z-Score -- Example Example: What is the z-score for the value of 95 in the following sample values?
96 114 100 97 101 102 99
95 90
The z-Score -- Example (Continued): The z-Score -- Example (Continued) Example: First compute the sample mean and sample standard deviation. These values are respectively 99.3333 6.5955. Verify.
Thus, z-score = (95 – 99.3333)/6.5955 = -0.6570 -0.66.
Thus, the data value of 95 is located 0.66 standard deviation below the mean value of 99.3333, since the z-score is negative.
4-2 Percentiles: 4-2 Percentiles Explanation of the term – percentiles: Percentiles are numerical values that divide an ordered data set into 100 groups of values with at most 1% of the data values in each group.
When we discuss percentiles, we generally present the discussion through the kth percentile.
Let the kth percentile be denoted by Pk.
4-2 Percentiles: 4-2 Percentiles Explanation of the term – kth percentile: the kth percentile for an ordered array of numerical data is a numerical value Pk (say) such that at most k% of the data values are smaller than Pk, and at most (100 – k)% of the data values are larger than Pk.
The idea of the kth percentile is illustrated on the next slide.
The kth Percentile: The kth Percentile Illustration of the kth percentile.
Quick Tip: : Quick Tip: In order for a percentile to be determined, the data set first must be ordered from the smallest to the largest value.
There are 99 percentiles in a data set.
Display of the 99th Percentile: Display of the 99th Percentile Illustration of the 99th percentile.
Percentile Corresponding to a Given Data Value: Percentile Corresponding to a Given Data Value The percentile corresponding to a given data value, say x, in a set is obtained by using the following formula.
Percentile Corresponding to a Given Data Value: Example: The shoe sizes, in whole numbers, for a sample of 12 male students in a statistics class were as follows: 13, 11, 10, 13, 11, 10, 8, 12, 9, 9, 8, and 9.
What is the percentile rank for a shoe size of 12?
Percentile Corresponding to a Given Data Value
Percentile Corresponding to a Given Data Value: Solution: First, we need to arrange the values from smallest to largest.
The ordered array is given below: 8, 8, 9, 9, 9, 10, 10, 11, 11, 12, 13, 13.
Observe that the number of values below the value of 12 is 9.
Percentile Corresponding to a Given Data Value
Percentile Corresponding to a Given Data Value: Solution (continued): The total number of values in the data set is 12.
Thus, using the formula, the corresponding percentile is: Percentile Corresponding to a Given Data Value The value of 12
corresponds to
approximately the
79th percentile.
Percentile Corresponding to a Given Data Value: Example: In the previous example, what is the percentile rank for a shoe size of 10 ?
Recall, the ordered array was: 8, 8, 9, 9, 9, 10, 10, 11, 11, 12, 13, 13.
Observe that the number of values below the value of 10 is 5.
Percentile Corresponding to a Given Data Value
Percentile Corresponding to a Given Data Value: Solution (continued): Recall, the total number of values in the data set was 12.
Thus, using the formula, the corresponding percentile is: Percentile Corresponding to a Given Data Value The value of 10
corresponds to
approximately the
46th percentile.
Procedure for Finding a Data Value for a Given Percentile: Assume that we want to determine what data value falls at some general percentile Pk.
The following steps will enable you to find a general percentile Pk for a data set.
Step 1: Order the data set from smallest to largest.
Step 2: Compute the position c of the percentile. To compute the value of c, use the following formula: Procedure for Finding a Data Value for a Given Percentile
Procedure for Finding a Data Value for a Given Percentile: Procedure for Finding a Data Value for a Given Percentile
Procedure for Finding a Data Value for a Given Percentile: Procedure for Finding a Data Value for a Given Percentile
Percentile Corresponding to a Given Data Value: Example: The data given below represents the 19 countries with the largest numbers of total Olympic medals – excluding the United States, which had 101 medals – for the 1996 Atlanta games. Find the 65th percentile for the data set.
63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22, 15, 15, 15, 15.
Percentile Corresponding to a Given Data Value
Percentile Corresponding to a Given Data Value: Solution: First, we need to arrange the data set in order. The ordered set is: .
15, 15, 15, 15, 17, 17, 19, 20, 21, 22, 23, 25, 27, 35, 37, 41, 50, 63, 65.
Next, compute the position of the percentile.
Here n = 19, k = 65.
Thus, c = (19 65)/100 = 12.35.
We need to round up to a value 13.
Percentile Corresponding to a Given Data Value
Percentile Corresponding to a Given Data Value: Solution (continued): Thus, the 13th value in the ordered data set will correspond to the 65th percentile.
That is P65 = 27.
Question: Why does a percentile measure relative position?
Percentile Corresponding to a Given Data Value
Question: Why does a percentile measure Relative Position?:
Display of the 65th Percentile along with
the data values. Question: Why does a percentile measure Relative Position?
Question: Why does a percentile measure Relative Position?: Question: Why does a percentile measure Relative Position? Referring to the diagram on the previous page,observe that the value of 27 is such that at most 65% of the data values are smaller than 27 and at most 35% of the values are larger than 27.
This shows that the percentile value of 27 is a measure of location.
Thus, the percentile gives us an idea of the relative position of a value in an ordered data set.
Percentile Corresponding to a Given Data Value: Example: Find the 25th percentile for the following data set:
6, 12, 18, 12, 13, 8, 13, 11, 10, 16, 13, 11, 10, 10, 2, 14.
Solution: First, we need to arrange the data set in order. The ordered set is:
2, 6, 8, 10, 10, 10, 11, 11, 12, 12, 13, 13, 13, 14, 16, 18.
Percentile Corresponding to a Given Data Value
Percentile Corresponding to a Given Data Value: Solution (continued):
Next, compute the position of the percentile.
Here n = 16, k = 25.
Thus, c = (16 25)/100 = 4.0.
Thus, the 25th percentile will be the average of the values located at the 4th and 5th positions in the ordered set.
Thus, P25 = (10 + 100/2 =10.
Percentile Corresponding to a Given Data Value
Special Percentiles – Deciles and Quartiles: Deciles and quartiles are special percentiles.
Deciles divide an ordered data set into 10 equal parts.
Quartiles divide the ordered data set into 4 equal parts.
We usually denote the deciles by D1, D2, D3, … , D9.
We usually denote the quartiles by Q1, Q2, and Q3. Special Percentiles – Deciles and Quartiles
Deciles: Deciles Nine deciles.
At most 10% of the values are in each group.
Quartiles: Quartiles Three quartiles.
At most 25% of the values are in each group.
Quick Tip: : Quick Tip: There are 9 deciles and 3 quartiles.
Q1 = first quartile = P25
Q2 = second quartile = P50
Q3 = third quartile = P75
D1 = first decile = P10
D2 = second decile = P20 . . .
D9 = ninth decile = P90
Quick Tip: : Quick Tip: P50 = D5 = Q2 = median
i.e. the 50th percentile, the 5th decile, and the 2nd quartile, and the median are all equal to one another.
Finding deciles and quartiles are equivalent equivalent to finding the equivalent percentiles.
OUTLIERS: OUTLIERS Recall that an outlier is an extremely small or extremely large data value when compared with the rest of the data values.
The following procedure allows us to check whether a data value can be considered as an outlier.
Procedure to Check for OUTLIERS: Procedure to Check for OUTLIERS The following steps will allow us to check whether a given value in a data set can be classified as an outlier.
Step 1: Arrange the data in order from smallest to largest.
Step 2: Determine the first quartile Q1 and the third quartile Q3. (Recall Q1 = P25 and Q3 = P75.
Procedure to Check for OUTLIERS: Procedure to Check for OUTLIERS Step 3: Find the interquartile range (IQR). IQR = Q3 – Q1.
Step 4: Compute (Q1 – 1.5IQR) and (Q3 + 1.5IQR).
Procedure to Check for OUTLIERS: Procedure to Check for OUTLIERS Step 5: Let x be the data value that is being checked to determine whether it is an outlier.
(a) If the value of x is smaller than (Q1 – 1.5IQR), then x is classified as an outlier.
(b) If the value of x is larger than (Q3 + 1.5IQR), then x is classified as an x is an outlier.
Procedure to Check for OUTLIERS: Procedure to Check for OUTLIERS
Slide49: Example: The data below represent the 20 countries with the largest number of total Olympic medals, including the United States, which had 101 medals for the 1996 Atlanta games. Determine whether the number of medals won by the United States is an outlier relative to the numbers for the other countries.
The data is given on the next slide.
Slide50: Example (continued): Data values – 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22, 15, 15, 15, 15, 101.
Solution: First, we need to arrange the data set in order. The ordered set is – 15 15 15 15 17 17 19 20 21 22 23 25 27 35 37 41 50 63 65 101.
Next we need to determine the first and third quartiles.
Verify that Q1 = P25 = 17 and Q3 = P75 = 39.
Slide51: Example (continued): Thus the IQR = 39 – 17 = 22.
Now, Q1 – 1.5IQR = 17 – (1.522) = -16.
and, Q3 + 1.5IQR = 39 + (1.522) = 72.
Since, 101 > 72, the value of 101 is an outlier relative to the rest of the values in the data set (based on the procedure presented here).
That is, the number of medals won by the United States is an outlier relative to the numbers won by the other 19 countries for the 1996 Atlanta Olympic Games.
Pictorial Representation for the OUTLIER of the Number of Olympic Medals Won by the United States in 1996 Atlanta Games.: Pictorial Representation for the OUTLIER of the Number of Olympic Medals Won by the United States in 1996 Atlanta Games. -16 +72 101 OUTLIER
BOX PLOTS: BOX PLOTS Explanation of the term – box plot: A box plot is a graphical display that involves a five-number summary of a distribution of values, consisting of the minimum value, the first quartile, the median, the third quartile, and the maximum value.
BOX PLOTS: BOX PLOTS A horizontal box-plot is constructed by drawing a box between the quartiles Q1 and Q3.
Horizontal lines are then drawn from the middle of the sides of the box to the minimum and maximum values.
BOX PLOTS: BOX PLOTS These horizontal lines are called whiskers.
A vertical line inside the box marks the median.
Outliers are usually indicated by a dot or an asterisk.
Example of a Box Plot for the Olympic (1996) Medal Count Data: Example of a Box Plot for the Olympic (1996) Medal Count Data
Information That Can Be Obtained From a Box Plot: Information That Can Be Obtained From a Box Plot
Information That Can Be Obtained From a Box Plot – Looking at the Median: Information That Can Be Obtained From a Box Plot – Looking at the Median If the median is close to the center of the box, the distribution of the data values will be approximately symmetrical.
If the median is to the left of the center of the box, the distribution of the data values will be positively skewed.
If the median is to the right of the center of the box, the distribution of the data values will be negatively skewed.
Information That Can Be Obtained From a Box Plot – Looking at the Length of the Whiskers: Information That Can Be Obtained From a Box Plot – Looking at the Length of the Whiskers If the whiskers are approximately the same length, the distribution of the data values will be approximately symmetrical.
If the right whisker is longer than the left whisker, the distribution of the data values will be positively skewed.
If the left whisker is longer than the right whisker, the distribution of the data values will be negatively skewed.
Box Plot Displaying Positive Skewness: Box Plot Displaying Positive Skewness
Box Plot Displaying a Symmetrical Distribution: Box Plot Displaying a Symmetrical Distribution
Box Plot Displaying a Negative Skewness: Box Plot Displaying a Negative Skewness