GEM 604 chapter 1

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

GEM 604 ADVANCED ENGINEERING STATISTICS : 

GEM 604 ADVANCED ENGINEERING STATISTICS Chapter I The Nature of Probability and Statistics Prof. Melinda M. Lupague

Statistics : 

Statistics A science that deals with the collection, presentation, analysis, and interpretation of data. Reasons why study Statistics - GSE students must be able to read and understand the various statistical studies performed in their field of specialization. - GSE students are called on to conduct research in their field of specialization since statistical procedures are basic to research. - GSE students can also use the knowledge gained from studying statistics to become better consumers and citizens.

Areas of Statistics : 

Areas of Statistics Descriptive Statistics - comprises those methods concerned with collecting and describing a set of data so as to yield meaningful information. Inferential Statistics - comprises those methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data A population consists of all subjects that are being studied A sample is a group of subjects selected from a population.

Slide 4: 

A parameter is a summary measure that is computed to describe a characteristics of an entire population. A statistic is a summary measure that is computed to describe a characteristics from only a sample of a population. A variable is a characteristic or attribute that can assume different values. Data are the values (measurements or observations) that the variables can assume.

Statistics : 

Statistics Sources of Data Data collectors are labeled primary sources. Data compilers are called secondary sources. Qualitative variables are variables that can be placed into distinct categories, according to some characteristic or attribute.

Statistics : 

Statistics Quantitative variables are numerical and can be ordered or ranked. Types: 1. Discrete variables assume values that can be counted. 2. Continuous variables can assume all values between any two specific values.

Statistics : 

Statistics Data Qualitative Quantitative Discrete Continuous

Slide 8: 

Scales of measurement 1. Nominal scale A scale of measurement for a variable that uses a label or name to identify an attribute of an element. Nominal data may be numeric or nonnumeric. 2. Ordinal scale A scale of measurement for a variable that has the properties of nominal data and classifies data that can be ranked; however, precise differences between the ranks do not exist.

Slide 9: 

3. Interval scale A scale of measurement that has the properties of ordinal data and the interval between observations is expressed in terms of a fixed unit of measure. Interval data are always numeric. 4. Ratio scale A scale of measurement for a variable that has the properties of interval data and the ratio of two values is meaningful.

Some Aspects of Descriptive Statistics : 

Some Aspects of Descriptive Statistics A. Data Collection: Sampling Techniques Sampling technique is a procedure used to determine the individuals or members of the sample. Types of Sampling Methods: 1. Probability Sample 2. Non-probability Sample

Slide 11: 

Probability Sampling Procedures Probability Sampling is a sampling technique wherein each member or element of the population has equal chance of being selected as members of the sample. Types: 1. Simple Random Sampling Random samples are selected by using chance methods or random numbers. One such method is to number each subject in the population. Then placed numbered cards in a bowl, mix them thoroughly, and select as many cards as needed. The subjects whose numbers are selected constitute the sample.

Slide 12: 

Sampling with replacement means that once a subject is selected, it is returned to the frame where it has the same probability of being selected again. Sampling without replacement means that a subject once selected, it is not returned to the frame and therefore cannot be selected. 2. Systematic Sampling Systematic samples are obtained by numbering each subject of the population and then selecting every kth subject.

Slide 13: 

3. Stratified Sampling Stratified samples are obtained by dividing the population into groups (called strata) according to some characteristics that is important to the study, then sampling from each group. 4. Cluster Sampling Cluster samples are selected by using intact group (called cluster) that is representative of the population.

Slide 14: 

Non-probability Sampling Procedures Non-probability Sampling is a sampling technique wherein members of the sample are drawn from the population based on the judgment of the researchers. Types: Convenience Sampling It is used because of the convenience it offers to the researcher 2. Quota Sampling The proportions of the various subgroups in the population are determined and the sample is drawn to have the same percentage in it.

Methods of Collecting Data : 

Methods of Collecting Data Direct or Interview Method Interviewing is a technique that is primarily used to gain an understanding of the underlying reasons and motivations for people’s attitudes, preferences or behavior. Interviews can be undertaken on a personal one-to-one basis or in a group.

Slide 16: 

Types of interview Structured: - Based on a carefully worded interview schedule. Frequently require short answers with the answers being ticked off. Useful when there are a lot of questions which are not particularly contentious or thought provoking. Respondents may be irritated by having to give over-simplified answers

Slide 17: 

Semi-structured The interview is focused by asking certain questions but with scope for the respondent to express him or herself at length. Unstructured This also called an in-depth interview. The interviewer begins by asking a general question. The interviewer then encourages the respondent to talk freely. The interviewer uses an unstructured format, the subsequent direction of the interview being determined by the respondent’s initial reply.

Questionnaires : 

Questionnaires Questionnaires are a popular means of collecting data, but are difficult to design and often require many rewrites before an acceptable questionnaire is produced.

Slide 19: 

Types of questions Closed questions A question is asked and then a number of possible answers are provided for the respondent. The respondent selects the answer which is appropriate. Closed questions are particularly useful in obtaining factual information:

Slide 20: 

Attitude questions - Frequently questions are asked to find out the respondents’ opinions or attitudes to a given situation. A Likert scale provides a battery of attitude statements. The respondent then says how much they agree or disagree with each one:

Slide 21: 

Open questions An open question such as ‘What are the essential skills a manager should possess?’ should be used as an adjunct to the main theme of the questionnaire and could allow the respondent to elaborate upon an earlier more specific question.

Observation Method : 

Observation Method Observation involves recording the behavioral patterns of people, objects and events in a systematic manner. Observational methods may be: - structured or unstructured- disguised or undisguised- natural or contrived- personal- mechanical - non-participant - participant

Structured or unstructured : 

Structured or unstructured In structured observation, the researcher specifies in detail what is to be observed and how the measurements are to be recorded. In unstructured observation, the researcher monitors all aspects of the phenomenon that seem relevant.

Disguised or undisguised : 

Disguised or undisguised In disguised observation, respondents are unaware they are being observed and thus behave naturally. In undisguised observation, respondents are aware they are being observed.

Natural or contrived : 

Natural or contrived Natural observation involves observing behavior as it takes place in the environment. In contrived observation, the respondents’ behavior is observed in an artificial environment.

Personal : 

Personal In personal observation, a researcher observes actual behavior as it occurs. The observer may or may not normally attempt to control or manipulate the phenomenon being observed. The observer merely records what takes place.

Mechanical : 

Mechanical Mechanical devices (video, closed circuit television) record what is being observed. These devices may or may not require the respondent’s direct participation. They are used for continuously recording on-going behavior.

Non-participant : 

Non-participant The observer does not normally question or communicate with the people being observed. He or she does not participate.

Participant : 

Participant In participant observation, the researcher becomes, or is, part of the group that is being investigated.

Experimental Method : 

Experimental Method Experimental methods are used to demonstrate causal relationships between two variables. In an experiment, the researcher systematically manipulates the variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Slide 31: 

B. Data Representation Ungrouped data are data that are not organized, or if arranged, could only be from highest to lowest or lowest to highest. Grouped data are data that are organized and arranged into different classes or categories.

Slide 32: 

Stem-and-Leaf Plot Stem and leaf plot is a table which sorts data according to a certain pattern. It involves separating a number into two parts: the stem and the leaves. Example: Use the stem and leaf plot for the test scores of 50 students in statistics. 60 72 87 80 94 78 65 70 69 72 78 70 77 76 78 82 74 74 65 69 72 67 74 75 78 77 78 82 66 70 68 76 70 81 73 80 79 71 64 71 73 80 80 77 65 70 78 72 74 74

Slide 33: 

Tabular Method A frequency distribution table which shows the data arranged into different classes and the number of cases which fall into each class. Steps in Constructing a Frequency distribution Table. 1. Decide on the number of classes. 2. Determine the class width i (i) When the data are whole numbers, i should be a whole number. (ii) When the data are in one decimal place, i should be in one decimal place.

Slide 34: 

3. Unless otherwise specified, always start the lowest class with the lowest value of the raw data in order to minimize the error. 4. Tally the frequencies for each class, until the highest value is reached. 5. The class interval can go beyond the highest value in the observation as long as the obtained i is followed.

Slide 35: 

Example: Construct a frequency distribution table of 8 classes for the test scores of 50 students in Statistics. 60 72 87 80 94 78 65 70 69 72 78 70 77 76 78 82 74 74 65 69 72 67 74 75 78 77 78 82 66 70 68 76 70 81 73 80 79 71 64 71 73 80 80 77 65 70 78 72 74 74

Slide 36: 

A relative frequency distribution is a table which lists the relative frequencies of the classes. A cumulative frequency distribution is a table which shows the number of cases falling below a particular value.

Slide 37: 

A bar chart is a graph represented by vertical rectangles whose bases represent the class intervals and whose heights represent the frequencies.

Slide 38: 

A histogram is a graph represented by vertical rectangles whose bases are the class boundaries and whose heights are the frequencies.

Slide 39: 

A frequency polygon is a graph that displays the data using lines that connect points plotted for the frequencies at the midpoints of the classes. The frequencies are represented by the heights of the points.

Slide 40: 

The ogive is a graph that represents the cumulative frequencies for the classes in a frequency distribution.

Slide 41: 

Other types of Graphs Used in Statistics Pareto chart A Pareto chart is used to represent a frequency distribution for a categorical variable, and the frequencies are displayed by the heights of vertical bars, which are arranged in order from highest to lowest. Example: The top 10 airlines with the most aircraft are listed below. Construct a Pareto chart for the data. American 714 United 603 Delta 600 Delta 424 US Airways 384

Some Aspects of Descriptive Statistics : 

Some Aspects of Descriptive Statistics

Slide 43: 

Time Series Graph A time series graph represents data that occur over a specific period of time. Example: Draw a time series graph to represent the data for the number of airline departures (in millions) for the given years. Year 1994 1995 1996 1997 1998 1999 2000 No. of 7.5 8.1 8.2 8.2 8.3 8.6 9.0 departures

Time Series graph : 

Time Series graph

Slide 45: 

Pie Graph A pie graph is a circle that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution. Example: The following elements comprise the earth’s crust, the outermost solid layer. Illustrate the composition of the earth’s crust with a pie graph.

Slide 46: 

Oxygen 45.6% Silicon 27.3 Aluminum 8.4 Iron 6.2 Calcium 4.7 Other 7.8

Statistical Description of Data : 

Statistical Description of Data Measures of Central Tendency 1. Mean The mean is the sum of the values divided by the total number of values. Mean : Ungrouped Data Sample mean Population mean

Slide 48: 

Example 1. A manufacturer of electronic components is interested in determining the lifetime of a certain type of battery. A sample, in hours of life, is as follows: 123, 116, 122, 110, 175, 126, 125, 111, 118, 117. Find the mean. Example 2. A tire manufacturer wants to determine the inner diameter of a certain grade of tire. The data are as follows: 572, 572, 573, 568, 569, 575, 565, 570. Treating the data as population, find the mean.

Mean: Grouped Data : 

Mean: Grouped Data where f = frequency of each class x = midpoint of each class n = total number of frequencies

Slide 50: 

Example 3. This frequency distribution represents the commission earned (in dollars) by 100 salespeople employed at several branches of a large chain store. Find the mean Class limits Frequency 150 – 158 5 159 – 167 16 168 – 176 20 177 – 185 21 186 – 194 20 195 – 203 15 204 – 212 3

Slide 51: 

Example 4. This frequency distribution represents the data obtained from a sample of 75 copying machine service technicians. The values represent the days between service calls for various copying machines. Class boundaries Frequency 15.5 – 18.5 14 18.5 – 21.5 12 21.5 – 24.5 18 24.5 – 27.5 10 27.5 – 30.5 15 30.5 – 33.5 3

2. Median : 

2. Median The median is the midpoint of the data array. Steps in computing the median of a data array. a. Arrange the data in order. b. Select the middle point. Example 5. Find the median for the ages of seven preschool children. The ages are 2, 3, 4, 2, 4, 5, and 2. Example 6. With reference to example 1, find the median lifetime of the battery.

Median: Grouped Data : 

Median: Grouped Data where Lx = lower class boundary of the median class n = total number of frequencies <cf2 = cumulative frequency before the median class f2 = frequency of the median class i = class width

Slide 54: 

Example 7. This frequency distribution represents the commission earned (in dollars) by 100 salespeople employed at several branches of a large chain store. Find the median. Class limits Frequency 150 – 158 5 159 – 167 16 168 – 176 20 177 – 185 21 186 – 194 20 195 – 203 15 204 – 212 3

3. Mode : 

3. Mode The value that occurs most often in a data set is called the mode. Example 8. The lengths of service (in years) of the Chief Justices of the Supreme Court are 7, 1, 5, 35, 28, 10, 15, 22, 11, 10, 12, 6, 8, 14, 18, 16. Find the mode. Example 9. Eleven different automobiles were tested at a speed of 75 kph for stopping distances. The data, in km, are shown below. Find the mode. 10, 13, 13, 13, 15, 17, 19, 19, 19, 21, 21

Mode : Grouped Data : 

Mode : Grouped Data where Lx = lower class boundary of the modal class (class with the highest frequency) d1 = numerical difference of the frequencies between the modal class and the class preceding it d2 = numerical difference of the frequencies between the modal class and the class following it i = class width

Slide 57: 

Example 10. This frequency distribution represents the data obtained from a sample of 75 copying machine service technicians. The values represent the days between service calls for various copying machines. Find the mode. Class boundaries Frequency 15.5 – 18.5 14 18.5 – 21.5 12 21.5 – 24.5 18 24.5 – 27.5 10 27.5 – 30.5 15 30.5 – 33.5 3

Properties and Uses of Central Tendency : 

Properties and Uses of Central Tendency The Mean One computes the mean by using all the values of the data. The mean varies less than the median or mode when samples are taken from the same population and all three measures are computed for these samples. The mean is used in computing other statistics, such as the variance.

Slide 59: 

The mean for the data set is unique and not necessarily one of the data values. The mean cannot be computed for an open-ended frequency distribution. The mean is affected by extremely high or low values, called outliers, and may not the appropriate average to use in these situations.

Slide 60: 

The Median The median is used when one must find the center or middle value of a data set The median is used when one must determine whether the data values fall into the upper half or lower half of the distribution. The median is used for an open-ended distribution. The median is affected less than the mean by extremely high or extremely low values.

Slide 61: 

The Mode The most is used when the most typical case is desired. The mode is the easiest average to compute. The mode can be used when the data are nominal. The data is not always unique. A data set can have more than one mode, or the mode may not exist for a data set.

Distribution Shapes : 

Distribution Shapes Frequency distribution can assume may shapes. The three most shapes are positively skewed, symmetric, and negatively skewed.

Distribution Shapes : 

Distribution Shapes

Distribution Shapes : 

Distribution Shapes

B. Measures of Variation : 

B. Measures of Variation 1. Range The range is the highest value minus the lowest value. The symbol R is used for the range. Example 11. The lengths of service (in years) of the Chief Justices of the Supreme Court are 7, 1, 5, 35, 28, 10, 15, 22, 11, 10, 12, 6, 8, 14, 18, 16. Find the range.

2. Population Variance and Standard Deviation : 

2. Population Variance and Standard Deviation The variance is the average of the squares of the distance each value is from the mean. The symbol for the population variance is 2 and is given by where x = individual value  = population mean N = population size

Standard Deviation : 

Standard Deviation The standard deviation is the square root of the variance. The symbol for the population standard deviation is  and is given by Example 12. Twelve students were given an arithmetic test, and the times (in minutes) to complete it were 10, 9, 12, 11, 8, 15, 9, 7, 8, 6, 12, 10. Treating the data as population, find the variance and standard deviation.

Sample Variance and Sample Standard Deviation : 

Sample Variance and Sample Standard Deviation The formula for the sample variance, denoted by s2, is The standard deviation for a sample (denoted by s) is

Slide 69: 

Example 13. The normal daily high temperatures (in degrees Fahrenheit) in January for 10 selected cities are as follows. 50, 37, 29, 54, 30, 61, 47, 38, 34, 61 Assume the data represent samples, find the variance and standard deviation.

Standard Deviation for Grouped Data : 

Standard Deviation for Grouped Data Method 1. where f = frequency of each class x = class mark of each class n = total number of frequencies Method 2. where d = unit coded deviation from the assumed mean (class mark of any class)

Standard Deviation for Grouped Data : 

Standard Deviation for Grouped Data Example 14. These data represent the net worth (in millions of dollars) of 50 businesses in a large city. Class limits Frequency 10 – 20 5 21 – 31 10 32 – 42 3 43 – 53 7 54 – 64 18 65 – 75 7 Find the variance and standard deviation.

Uses of the Variance and Standard Deviation : 

Uses of the Variance and Standard Deviation Variance and standard deviations can be used to determine the spread of data. If the variance or standard deviation is large, the data are more dispersed. This information is useful in comparing two (or more) data sets to determine which is more (most) variable. The measures of variance and standard deviation are used to determine the consistency of a variable.

Slide 73: 

The variance and standard deviation are used to determine the number of data values that fall within a specified interval in distribution. Finally, the variance and standard deviation are used quite often in inferential statistics.

Coefficient of Variation : 

Coefficient of Variation The coefficient of variation is the standard deviation divided by the mean. The result is expressed as a percentage. For samples, For populations,

Coefficient of Variation : 

Coefficient of Variation Example 15. The average score on an English final examination was 85, with a standard deviation of 5; the average score on a history exam was 110, with a standard deviation of 8. Which class was more variable?

CHEBYSHEV’S THEOREM : 

CHEBYSHEV’S THEOREM The proportion of values from a data set that will fall within k standard deviations of the mean will be at least 1 – 1/k2, where k is an number greater than 1(k is not necessarily an integer). Example 16. Using Chebyshev’s theorem, solve these problems for a distribution with a mean of 80 and a standard deviation of 10. a. At least what percentage of values will fall between 60 and 100? b. At least what percentage of values will fall between 65 and 95?

CHEBYSHEV’S THEOREM : 

CHEBYSHEV’S THEOREM Example 17. A sample of the hourly wages of employees who work in restaurants in a large city has a mean of Php40 and a standard deviation of Php8. Using Chebyshev’s theorem, find the range in which at least 75% of the data values will fall. Example 18. The average of the number of trials it took a sample of mice to learn to traverse a maze was 12. The standard deviation was 3. Using Chebyshev’s theorem, find the minimum percentage of data values that will fall in the range of 4 to 20 trials.

EMPIRICAL RULE : 

EMPIRICAL RULE When a distribution is bell-shaped (or what is called normal), the following statements are true. Approximately 68% of the data values will fall within 1 standard deviation of the mean. Approximately 95% of the data values will fall within 2 standard deviations of the mean. Approximately 99.7% of the data values will fall within 3 standard deviations of the mean.

Slide 79: 

Example 19. The average US yearly per capita consumption of citrus fruit is 26.8 pounds. Suppose that the distribution of fruit amounts consumed is bell-shaped with a standard deviation equal to 4.2 pounds. What percentage of Americans would you expect to consume more than 31 pounds of citrus fruit per year? Example 20. For this data set, find the mean and standard deviation of the variable. The data represent the ages of 30 customers who ordered a product advertised on television. Count the number of values that fall within 2 standard deviations of the mean. Compare this with the number obtained from Chebyshev’s theorem. Comment on the answer.

MEASURES OF POSITION : 

MEASURES OF POSITION Standard Scores A standard score or z-score is the number of standard deviations that a data value is above or below the mean and is given by For samples, For populations,

Z-Scores : 

Z-Scores Example 21. A student score on a mathematics test has a `mean of 54 and a standard deviation of 3, and she scores 80 on a history test with a mean of 75 and a standard deviation of 2. On which test did she perform better?

Percentiles : 

Percentiles Positions in hundredths that a data value holds in the distribution. The percentile corresponding to a given value X is given by Percentile = Finding a data value corresponding to a given percentile Step 1. Arrange the data in order from lowest to highest

Percentiles : 

Percentiles Step 2. Substitute into the formula where n = total number of values p = percentile Step 3.(a) If c is not a whole number, round up to the next whole number. Starting at the lowest value, count over to the number that corresponds to the rounded-up value. (b) If c is a whole number, use the value halfway between the cth and (c+1)th values when counting up from the lowest value.

Percentiles : 

Percentiles Example 22. Find the percentile ranks of the scores 35 and 49 in the data set. 12, 28, 35, 42, 47, 49, 50

Quartiles : 

Quartiles Positions in fourths that a data value holds in the distribution. Finding data values corresponding to Q1, Q2, and Q3 Step 1. Arrange the data in order from lowest to highest. Step 2. Find the median of the data values. This is the value for Q2. Step 3. Find the median of the data values that fall below Q2. This is the value for Q1.

Quartiles : 

Quartiles Step 4. Find the median of the data values that fall above Q2. This is the value for Q3. Example 23. Find Q1, Q2, and Q3 for the data set 20, 18, 11, 10, 17, 55, 27, 23.

Outliers : 

Outliers An outlier is an extremely high or extremely low data value when compared with the rest of the data values. Procedure for identifying outliers Step 1. Arrange the data in order and find Q1 and Q3. Step 2. Find the inter-quartile range: IQR = Q3 – Q1 Step 3. Multiply the IQR by 1.5. Step 4. Subtract the value obtained in step 3 from Q1 and add the value to Q3.

Outliers : 

Outliers Step 5. Check the data set for any data value that is smaller than Q1 – 1.5(IQR) or larger than Q3 + 1.5(IQR) Example 24. Check for outliers in the data 20, 18, 11, 10, 17, 55, 27, 23.