Six Sigma 360 : Six Sigma 360 Introduction to Minitab 1
Minitab : Is an Application Software
For Studying Statistical Tools and
Applying Them for Business Needs.
It Complements Six Sigma for its Features and Ease of
Implementation. Minitab 2
Objectives : At the end of this this topic, Introduction to Minitab, you
will be able to:
Describe the basics of Minitab.
Derive statistical parameters for a given set of data using Minitab and otherwise.
Analyze the given data graphically and statistically using Minitab. Objectives 3
Basics : In Minitab, there are projects (Minitab Project, *.mpj) in which
Data is stored in Minitab Worksheet, *.mtw
Graphs are stored in Minitab Graphs, *.mgf
Results in Minitab Session, *.txt
Reports in Minitab Project Report: *.rtf
To Start a New Project
Choose File > New.
Choose Project, then click OK.
To save a project, you save all your work at once: all the data, all the output in the Session window, and all the open Graph windows. When you reopen the project, all that information appears for you. Basics 4
Basics : To Save a Project:
Choose File > Save Project As.
In Save in, navigate to the location where you'd like to keep your project.
In File name, enter a name for your project, and click Save.
After this, simply choose File > Save Project to save all of your work.
Output or data can be used in another application or project by saving Session window output, data, and graphs as separate files.
To Open a Worksheet
Data in an empty project is written in the available worksheet.
It can be copied from a different worksheet by opening it in the project. Basics 5
Basics : To use the file <Pressure.mtw> available in the Data subdirectory or folder.
Choose File > Open Worksheet
Move to the Data subdirectory and select the worksheet Pressure.mtw.
Click Open.
Graphs allow you to display patterns, relationships, and distributions in your data that are difficult to evaluate simply by looking at a worksheet. We will study few graphs useful for data analysis.
Selecting Graph Items for Editing Cosmetic changes like the font of or color, or structural changes like increasing the range of a scale can be done.
For making selection, there are three methods:
Click the item.
Choose Editor > Select Item, then choose the item from the list.
Select the item from the list in the graph editing toolbar Basics 6
Basics : After selecting the intended item, editing can be done by either of the ways:
Double-click the item.
Choose Editor > Edit.
Right-click and choose Edit.
Click, the Edit button, on the Graph Editing toolbar.
Graphs in Other Applications
Graph may be added in Report Pad in Minitab or to Word or PowerPoint.
With the layout active, choose Edit > Copy Graph.
In the Word document, place the cursor where you want to insert the graph.
Choose Edit > Paste Special. Basics 7
Basics : In As, choose Mtb Graph Object. This ensures that the graph can be edited in the Word document. With other options (picture or bitmap, for example), the resulting image cannot be edited with Minitab graph editing tools.
Click OK.
Double-click the graph in the Word document. Minitab's graph annotation toolbar appears, and you can double-click any graph item to edit it as you would in Minitab.
Save and Exit
Choose File > Save Project.
Exit Minitab by choosing File > Exit.
It also saves the Report Pad which can be referred for presentations. Basics 8
Basics: Window : Basics: Window Menu Bar Session Window:
Analytical Output Data Window:
A Worksheet different from a Spreadsheet
Column names are above first row 3 data types of Columns: Text/ Numeric/ Date
Everything in a column is considered to be the same variable Info Window:
Synopsis of worksheet History Window:
Stores Commands Multiple (max Four) Interactive Windows.
Only one can open at a time.
Windows can be saved separately. Tool Bar 9
Basics: Toolbar : Basics: Toolbar Session Window Toolbar 10
Basics: Toolbar : Basics: Toolbar Data Window Toolbar Commands can also be accessed from drop down menus or Hot keys. 11
Basics : Basics To Transfer Data from Worksheet to Session Window:
Go to Data > Display Data 12
Basics : Basics Change Data Type:
Manip > Change Data Type > Numeric to Text
Manip > Change Data Type > Text to Numeric
Manip > Change Data Type > Date/Time to Text
Manip > Change Data Type > Date/Time to Numeric
Manip > Change Data Type > Numeric to Date/Time
Manip > Change Data Type > Text to Date/Time To Change Data type:
Go to Data > Change Data Type Numeric to Text 13
Minitab Basics : 14 Minitab Basics To Code Data type:
Go to Data > Code Text to Numeric Code
Minitab Basics : 15 Minitab Basics To Extract data:
Go to Data > Extract from Date/ Time > Text
Data > Extract from Date/ Time > Numeric Month extracted from Date
Basic Statistics : 16 Basic Statistics Statistics is the collection, organization, analysis, interpretation and
presentation of data. The most common ones are:
Count Average
Minimum Maximum
Sum Percent Defects
We would be studying a few of the important Statistics such as:
Statistics which defines the location:
Mean Median
Mode
Statistics which defines the dispersion:
Range Variance
Standard Deviation Quartile Recall
Basic Statistics : Dispersion : 17 Basic Statistics : Dispersion Turn-Around-Time (TAT) of service calls for two brands of TVs ‘A’ and ‘B’ are shown. As a customer, which one would you prefer to buy? Turn-around-time A Turn-around-time B A B Turn-around-time A Turn-around-time B A B I II Debate
Basic Statistics : Dispersion : 18 Basic Statistics : Dispersion Only central - tendency of data cannot be the deciding factor for judging performance. The variation- tendency (spread) of data also needs to be known.
Range
A value obtained by finding the difference between maximum and minimum values of a data set = Maximum – Minimum
Variance and Standard Deviation
Deviation is the distance of data from its mean, showing how much data is distributed. If these deviations are summed up, it gives zero. In order to find the data deviation, the individual differences from the mean are squared. The sum of the values gives variance and when the square root is taken, its gives the standard deviation.
Population and Sample Statistics : 19 Population and Sample Statistics Population Statistics Sample Statistics
Basic Statistics : Dispersion : 20 Basic Statistics : Dispersion Quartile
Data is put in order and each quartile holds 25% of the total data
components. Quartile gives some idea of the dispersion of data. There are
three quartiles designated as Q1, Q2 and Q3
First quartile, Q1 = a value corresponding to 25%
Second Quartile (median), Q2 = a value corresponding to 50%
Third Quartile, Q3 = a value corresponding to 75%
Inter quartile Range, IQR = Q3-Q1
Example - Calculate quartile and IQR for the data:
1, 10, 20, 4, 9, 5, 4, 3 Q1 = 3.25 Q2 (Median) = 4.5 Q3 = 10.75 1 3 4 4 5 10 9 20 IQR = 7.5
Exercise : 21 Exercise Find mean, median, mode, range, variance, and standard
deviation (sigma) for the sample data given below:
2, -2, 0, 1, 5, 4, 3, 1, 0, -2, -4, -3 ,-2, -2, 0
Solution : 22 Solution For the data set below, the values for various parameters is
2, -2, 0, 1, 5, 4, 3, 1, 0, -2, -4, -3 ,-2, -2, 0.
The sorted data is -4, -3, -2, -2, -2, -2, 0, 0, 0, 1, 1, 2, 3, 4, 5.
Mean = 0.0667
Median = 0
Mode = -2
Range = 9-5 = 4
Variance = 6.92
Standard Deviation (sigma) = 2.63 Q1 Q2 Q3 IQR = Q3 – Q1 = 0
Solving Using Minitab : 23 Solving Using Minitab Open a new Worksheet (in the existing or a new Project)
Type in the numbers vertically in a column
Name the column as DATA Go to Stat Basic Statistics Display Descriptive Statistics. Double- click C1 to be chosen as “Variables”
Go to Statistics to select the required values and click OK
Find the answers displayed in the Session window.
Solving Using Minitab : 24 Solving Using Minitab
Graphical Analysis : 25 Graphical Analysis We have seen that Minitab can be used for statistical analysis
of data. Now let us study how we can analyze data
graphically using the tools listed below: Pareto Chart
Histogram
Dot Plot
Box Plot
Scatter Plot
Matrix Plot
Marginal Plot
Time-Series Plot
Run Chart
Normality Test
Pareto Analysis : 26 Pareto Analysis Pareto chart is named after an Italian economist Vilfredo Pareto whose theory states that 80% of wealth is owned by 20% of the population. Based on this theory, the chart is used to visually depict the significance level of categories plotted.
The theory that 80% of the problems come from 20% of the causes is called Pareto’s principle. While the percentages may not be always exactly 80/20, there usually are “the vital few and the trivial many.”
Generally, Pareto chart is used to plot a measurement that reflects cost to the organization. Vilfredo Pareto
1848 - 1923 Example – Draw a Pareto chart for Customer Complaints received from eight different zones of India – 54,25,75,12,65,42,12,41
Pareto Analysis : 27 Pareto Analysis To draw a Pareto chart, follow the path –
Minitab> Stats > Quality Tools > Pareto Chart
Pareto Analysis : 28 Pareto Analysis Conclusion:
The chart shows One, Three and Five as the major sources of customer
complaints. The improvements should be initially concentrated on these
major contributors.
Pareto Chart is used when:
Trying to focus on the most significant problem or cause
Relating cause and effect, by comparing a Pareto chart classified by causes with one classified by effects
Analyzing data by groups, to reveal unnoticed patterns
Communicating with others about your data
Evaluating improvement, by comparing before and after data
Exercise : 29 Exercise Draw a Pareto Chart for the following travel expenses for six months. As the approving authority for travel expenses, which departments should you monitor first for controlling the expenses:
Solution : 30 Solution Pareto Chart depicts (visually) the total expenses for six months for each
of the five departments. Maximum Contributors
Exercise : 31 Exercise Reena is a quality personnel whose job involves inspecting TVs for sound output on the scale 1 to 5, (1 being the lowest wattage and 5 the highest). Following is the data for 20 TVs. Calculate the percentage of TVs that Reena should reject. (Criterion for reject is that the sound output is between 1 – 3).
Solution : 32 Solution
Solution : 33 Solution The percentage of rejected TVs = 25 + 15 + 10 = 50%
Histogram : 34 Histogram Is a chart that displays distribution, center location, and variation of data by categorizing data.
Unlike bar graph (commonly used in Excel), it can show distribution of continuous data.
From the given data, draw a Histogram for the number of
working days (production) in a month for a three year period.
Histogram : 35 Histogram
Histogram : 36 Histogram Data labels may be defined
Histogram : 37 Histogram Shows distribution in comparison to normal curve of same mean and standard deviation Other options may be tried
Dot Plot : 38 Dot Plot Is a chart that plots dots on a number line depicting frequency and spread of the data. If the data size is large, each dot on the chart represents more than one value.
Considering the same data used for plotting Histogram, let us
now draw a Dot Plot
Go to
Minitab > Graph > Dotplot
Choose the option “ Simple”
Dot Plot : 39 Dot Plot To find the corresponding values for dots,
Right- click on the graph and choose Brush.
Right-click again and select the columns required to be displayed in “Set ID Values”.
Choose the area and find the display.
Box Plot : 40 Box Plot Boxplot is used to obtain information about the shape, dispersion, and mid-value of a given data.
Spots outliers
Used to assess the symmetry of the data
Draw a Boxplot the same working days example. (Change the working
days for Jun-2003 as 10 and Dec-2003 as 31).
Go to
Minitab > Graph > Boxplot
Choose the option “ Simple”
Select the data column
Box Plot : 41 Box Plot MEDIAN 3rd Quartile Q3 1st Quartile Q1 LOWER LIMIT= Q1 - 1.5 (Q3- Q1) UPPER LIMIT= Q3 + 1.5 (Q3- Q1) OUTLIER OUTLIER
Box Plot : Analysis : 42 Box Plot : Analysis The line drawn through the box represents the median of the data.
The edge above the median represents the first quartile (Q1), while the edge below represents the third quartile (Q3). Thus the box portion of the plot represents the interquartile range (IQR = Q3-Q1), or the middle 50% of the observations.
The lines extending from the box are called whiskers. The whiskers extend outward to indicate the lowest and highest values in the data set (they exclude outliers).
Extreme values, or outliers, are represented by dots. A value is considered an outlier if it is outside of the box (greater than Q3 or less than Q1) by more than 1.5 times the IQR.
Brush may be used on the graph to find the values of outliers.
Scatter Plot : 43 Scatter Plot The Scatter Plot illustrates the relationship between two variables.
Though the variation in one variable with respect to the other is graphically shown, the graph cannot be relied on making judgments on the same. Example- Relationship between population in India to Growth rate of China.
Draw the Scatter Plot for the data
Scatter Plot : 44 Scatter Plot For the given data, we may find the relationship of Sales w.r.t. the expenses on Advertisement cost, its variation with month/ Ad agency
Go to Graph > Scatterplot > Simple
All three plots can be made together by choosing Sales Exp as y and Adv Exp, month, and Ad agency as x(s)
Scatter Plot : 45 Scatter Plot No relation Positive relation, but groups High sales expenses in the year 2001
Scatter Plot (With Group) : 46 Scatter Plot (With Group) Plot suggests Sales are more for Smart Agency
Matrix Plot : 47 Matrix Plot Matrix Plot is used to find relationship between multiple variables
Let us create a Matrix Plot for the Sales data discussed in Scatter Plot
Go to Graph > Matrix Plot > With Groups and enter data as:
Matrix Plot : 48 Matrix Plot Mirror Image Mirror Image Same result can be drawn from the plot
Marginal Plot : 49 Marginal Plot A marginal plot is a Scatter plot with graphs at the margins of the x- and/or y-axes, which depict the distribution of the points in each direction, or the sample marginal distributions.
Let us create a Marginal Plot for
the given data:
Marginal Plot : 50 Marginal Plot Go to Graph > Marginal Plot > With Histograms
Marginal Plot : 51 Marginal Plot Shows positive relationship Expert Tip: The Marginal plot only indicates the relationship between entities.
The method of finding the exact equation between entities will be learnt later.
Time Series Plot : 52 Time Series Plot The Time Series Plot is used to:
Detect seasonality in data
Detect trends in data over time
Compare trends across groups
The time series data is plotted on the vertical y-axis versus time on the horizontal x-axis.
A Scatter plot can be used instead, if:
the data is not in chronological order, or
the data collection intervals are irregular, you may want to create a Scatter plot instead.
Time Series Plot : 53 Time Series Plot Example – A toy manufacturer in America has four production lines working on all days. The data shows number of defects in each line for the month of January, 2005. Find the stability of the data.
Go to
Stat > Time Series > Time Series Plot
Time Series Plot : Multiple : 54 Time Series Plot : Multiple
Time Series Plot : Multiple : 55 Time Series Plot : Multiple
Time Series Plot : Simple : 56 Time Series Plot : Simple
Time Series Plot : Result Analysis : 57 Time Series Plot : Result Analysis The time series plot consists of
Time scale (index, calendar, clock, or stamp column) on the x-axis
Data scale on the y-axis
Lines displaying each time series
For the data, four sets of data, one for each line are plotted the same time series plot.
The time series plot suggests no grouping or seasonal effect in data. Though it gives some indication but primarily suggests no trends/ some oscillations.
The plot is not a good way to judge. To statistically show the evidence of non- randomness, Run chart is used.
Run Chart : 58 Run Chart Run charts are used to monitor process changes associated with characteristic of interest over time.
It is used to find patterns in process data or randomness or stability on the basis of –
Test for number of runs about the median
Clusters
Oscillations
Test for number of runs up and down
Trends
Mixtures
Consider the same example of toys.
Go to Stat > Quality Tools > Run Chart
Run Chart : 59 Run Chart If the data is in single column If the data is in multiple columns
Run Chart : 60 Run Chart Run about the median
Run Chart : Graph Analysis : 61 Run Chart : Graph Analysis Run about the median
A run about the median is one or more consecutive points on the same side of the median. When the points are connected by a line, a run ends when the line crosses the median. A new run begins with the next plotted point.
In the example, 17 runs about the median were observed. The blue circles marked show the runs about the median.
Run Chart : Graph Analysis : 62 Run Chart : Graph Analysis Test for number of runs about the median
The test checks two types of non-random behavior - mixtures and clusters.
Mixtures are characterized by an absence of points near the median.
Clusters are groups of points that have similar values.
When the observed number of runs is
statistically greater than the expected number of runs, then mixtures are suggested. (In the example, 17>16.35, therefore mixtures).
statistically less than the expected number of runs, then clusters are suggested.
In this example, though Mixtures are more than Clusters, the p-values for clustering (0.594) and mixtures (0.405) are greater than the alpha-level of 0.05. Therefore, you can conclude that the data does not indicate mixtures or clusters.
Run Chart : 63 Run Chart 17>16.35 19<20.33 Run up or down
Run Chart : Graph Analysis : 64 Run Chart : Graph Analysis Run up or down
A run up or down is one or more consecutive points in the same direction. A new run begins each time there is a change in direction (either ascending or descending) in the sequence of data.
For example, when the preceding value is smaller, a run up begins and continues until the proceeding value is larger than the next point, then a run down begins.
In the example below, 19 runs up or down were observed. The blue lines marked show the runs up or down.
Run Chart : Graph Analysis : 65 Run Chart : Graph Analysis Test for number of runs up and down
This test is based on the number of runs up or down - increasing or decreasing - and is sensitive to two types of non-random behavior - oscillation and trends.
When the observed number of runs is
statistically greater than the expected number of runs, then oscillation is suggested.
statistically less than the expected number of runs, then a trend is suggested. (In the example, 19<20.33, therefore trends).
In this example, the p-values for trends (0. 279) and oscillation (0. 720) are greater than the alpha-level of 0.05. Therefore, you can conclude that the data does not indicate a trend or oscillation.
Normality Test : 66 Normality Test A normality test is used to check if the given data is normally distributed.
It is important to check for normality because most of the tools (tests of means and variances) to be discussed, assume the data to be normal. The data, therefore should first be checked because taking it through the tools.
Though most of the data collected is normal (Central Limit Theorem), as soon as the data is collected, it should be checked for normality before it is treated by a statistical tool. Normality may also be checked for any data at any point of project cycle, which is relevant to the project.
Normality Test : 67 Normality Test There are various methods to check for
Normality. We will discuss the three most
commonly used methods:
Probability Plot
Follow the path in Minitab:
Minitab> Graph > Probability Plot
Example – Given is the data of the number of accidents in an year on a highly accident – prone highway. Check if the data is normal. How to Check for Normality?
Normality Test: Probability Plot : 68 Normality Test: Probability Plot
Normality Test: Probability Plot : 69 Normality Test: Probability Plot
Normality Test: Probability Plot : 70 Normality Test: Probability Plot 0.05 is the alpha value (alpha risk/ type – I error or the significance level). Generally, the value assigned is 5% unless stated otherwise or demanded by the process. Conclusion:
P – value = 0.436. Since p – value is greater than 0.05, the given data is normal.
Normality Test: Graphical Summary : 71 Normality Test: Graphical Summary Minitab> Stat > Basic Statistics > Graphical Summary Default is 95% unless stated otherwise
Confidence Level : 72 Confidence Level Confidence Interval gives you a range of likely values based on the sample. Confidence Level is how sure you want to be that the population mean or std. deviation falls in the confidence interval you are going to calculate based on the sample!
Six Sigma and industry typically use a 95% Confidence Level which means:
95% chance that the population mean or std. deviation lies within the confidence interval.
5% chance (alpha risk) that population mean is outside the confidence interval.
For highly sensitive processes which are working at a high sigma level, this value of alpha risk may be reduced from 5% to ensure less alpha error. Similarly, vice – versa also holds true.
Normality Test: Graphical Summary : 73 Normality Test: Graphical Summary Conclusion:
Since p – value is greater than 0.05, the given data is normal. Confidence interval for Mean
Normality Test : 74 Normality Test Minitab> Stat > Normality Test
Normality Test : 75 Normality Test Conclusion:
P – value = 0.436. Since p – value is greater than 0.05, the given data is
normal. The Prediction Interval is the range in which the new response value is expected to fall. That is, it provides an interval of possible response values given a combination of predictor levels.
Exercise : 76 Exercise For the given data, check if it is normal. If so, analyze your data using all/
few of the following charts:
Pareto Chart
Histogram
Dot Plot
Box Plot
Scatter Plot
Matrix Plot
Marginal Plot
Time-Series Plot
Run Chart
Solution : 77 Solution Here the response is Durability and rest are the input variables. Let us do Normality Test to find if the data is Normal. Since p > 0.05, Data is Normal
Solution : 78 Solution Though not mentioned, the most desired value for Durability would be the highest value
Let us draw a dot plot to see the response. As seen using Brush, the lower values occur for the combinations below.
Solution : 79 Solution Let us see if a Box plot can give some information Strength brand gives lower Durability (also seen from Dot plot) Brand 4 gives more variation but also higher values Brand 2 gives lower Durability values
Solution : 80 Solution We would code the text variables so that we can treat them as Numeric – Composition (A=1,B=2) and Colour (White=1, Red=2, Black =3)
Solution : 81 Solution Data > Code > Text to Numeric
Solution : 82 Solution In this case, we are interested only in the relationship of Xs with y, Durability, we are using a Scatter plot
Solution : 83 Solution
Solution : 84 Solution Conclusions
Durability is undesirable if:
There is less plastic fiber
Less porosity_inv
High temperature (beyond 30)
High Ncolour (Black)
Strength brand
Carpet type 2
Both compositions give mixed values.
Self- Practice Exercise : 85 Self- Practice Exercise Analyze the data for the defect causes given by <Charts_Exercise2.mtw>,
using the discussed tools.
Discuss the answers
amongst your
teams.
Solution : 86 Solution p>0.05, Random data Major Contributors
Thank You : 87 Thank You