introduction

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide 1: 

Application of SPSS in Social Science Research By Dr.R.RAVANAN Reader in Statistics Presidency College Chennai – 600 005 E-mail: ravananstat@gmail.com Mobile: 94442 21627

BASIC CONCEPTS : 

7-Dec-09 Dr.R.RAVANAN, Presidency College BASIC CONCEPTS Population Collection of all individuals or objects or items under study and denoted by N Sample A part of a population and denoted by n Variable Characteristic of an individual or object. Qualitative and Quantitative variables Parameter Characteristic of the population Statistic Characteristic of the sample

Chart on population, sample and statistical inference : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Chart on population, sample and statistical inference Population too large Collect data from the sample Organise data Analyse the Organised data Sample drawn from the population Draw inference which is applicable to the population

Notations of Population and Sample : 

Notations of Population and Sample

Organising a raw data set : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Organising a raw data set

Pictorial representation of a data set : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Pictorial representation of a data set

Summarising a raw data set on a quantitative variable : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Summarising a raw data set on a quantitative variable

Slide 8: 

7-Dec-09 Dr.R.RAVANAN, Presidency College Sampling Techniques

Determination of sample size : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Determination of sample size Three factors for specifying a sample size SD of the population Acceptance level of sampling error Expected Confidence Level Sample size n = (ZS/E)2 Where Acceptable Error E = ZSx and SE of the mean Sx=S/?n S = Sample SD or an estimate of the population SD Z = Standardized value corresponding to a confidence level For Example, Z=1.96, S=90, E=6 then n= 864 Z=1.96, S=90, E=12 then n=216 Infer from above calculation that the sample size can be reduced to almost one-fourth of its original size by doubling the range of acceptable error

Stages in Data Analysis : 

Stages in Data Analysis Editing Coding Data Entry (Keyboarding) Data Analysis Error Checking And Verification Descriptive Analysis Univeriate Analysis Bivariate Analysis Multivate Analysis Interpretation

Statistical Inference : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Statistical Inference

The Concept of P Value : 

7-Dec-09 Dr.R.RAVANAN, Presidency College The Concept of P Value Given the observed data set, the P value is the smallest level for which the null hypothesis is rejected (and the alternative is accepted) If the P value ? ? then reject H0 ; Otherwise accept H0 If the P value ? 0.01 then reject H0 at 1% level of significance If the P value lies between 0.01 to 0.05 (ie. 0.01< P value ? 0.05) then reject H0 at 5% level of significance If the P value ? 0.05 then accept H0 at 5% level of significance

Measurement Scales : 

Measurement Scales Types of measurement scales are Nominal Scale Ordinal Scale Interval scale Ratio Scale

The Measurement Principles : 

The Measurement Principles NominalOrdinalIntervalRatioPeople or objects with the same scale value are the same on some attribute. The values of the scale have no 'numeric' meaning in the way that you usually think about numbers.People or objects with a higher scale value have more of some attribute. The intervals between adjacent scale values are indeterminate. Scale assignment is by the property of "greater than," "equal to," or "less than."Intervals between adjacent scale values are equal with respect the the attribute being measured. E.g., the difference between 8 and 9 is the same as the difference between 76 and 77.There is a rationale zero point for the scale. Ratios are equivalent, e.g., the ratio of 2 to 1 is the same as the ratio of 8 to 4.

Examples of the Measurement Scales : 

Examples of the Measurement Scales

Permissible Arithmetic Operations : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Permissible Arithmetic Operations

Appropriate Statistics : 

Appropriate Statistics

Statistical Inference : 

Statistical Inference There are two types of statistical inferences: Estimation of population parameters and hypothesis testing. Hypothesis testing is one of the most important tools of application of statistics to real life problems. Most often, decisions are required to be made concerning populations on the basis of sample information. Statistical tests are used in arriving at these decisions.

Five ingredients to statistical test : 

Five ingredients to statistical test Null Hypothesis Alternate Hypothesis Level of Significance Test Statistic Interpretation

Slide 22: 

Steps in Hypothesis Testing 1Identify the null hypothesis H0 and the alternate hypothesis H1. 2Choose ?. The value should be small, usually less than 10%. It is important to consider the consequences of both types of errors. 3Select the test statistic and determine its value from the sample data. This value is called the observed value of the test statistic. 4Compare the observed value of the statistic to the critical value obtained for the chosen a.

Types of Error : 

Types of Error

Slide 24: 

Use the key by answering the questions in the most relevant way.   1. Have you got more than two samples? No......go to 2 Yes.....go to 8 2. Have you got one or two samples? One.....Single sample t-test Two....go to 3 3. Are your data sets normally distributed (K-S test or Shapiro-Wilke)? No.......go to 4 Yes......go to 5 4. Do your data sets have any factor in common (dependence), i.e. location or individuals? No.Mann Whitney U test YesWilcoxon Matched Pairs 5. Do your data sets have any factor in common (dependence), i.e. location or individuals? No......go to 6 Yes.....paired sample t-test

Slide 25: 

Use the key by answering the questions in the most relevant way. 6. Do your data sets have equal variances (f-test)? No......unequal variance t-test Yes.....go to 7 7. Is n greater or less than 30? <30.....equal variance t-test or ANOVA >30.....z-test or ANOVA 8. Are your samples normally distributed and with equal variances? No......Kruskal-Wallis non-parametric ANOVA Yes.....go to 9 9. Does your data involve one factor or two factors? One.....One-way ANOVA (see also Multiple comparison tests) Two.....Two-way ANOVA (see also Multiple comparison tests)

A Classification of Multivariate Methods : 

A Classification of Multivariate Methods

Multivariate Analysis: Classification of Dependence Methods : 

Multivariate Analysis: Classification of Dependence Methods

Multivariate Analysis: Classification of Independence Methods : 

Multivariate Analysis: Classification of Independence Methods

Test of Hypothesis : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Test of Hypothesis Test of Hypotheses concerning mean(s). Test of Hypotheses concerning variance/Variances. Test of Hypotheses concerning proportions.

Small Sample Test : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Small Sample Test Test based on Student t Distribution ( W.S. Gorgett ) Test based on Snedecor’s F Distribution ( R.A. Fisher ) Test based on Chi square Distribution ( Karl Pearson )

Type of Statistical Tests and its Characteristics : 

7-Dec-09 Dr.R.RAVANAN, Presidency College Type of Statistical Tests and its Characteristics

Example for Tests of Hypotheses concerning Two population means : 

Example for Tests of Hypotheses concerning Two population means Sample I: 110, 120, 123, 112, 125 Sample II: 120, 128, 133, 138, 129

Tests of Hypotheses concerning proportion(s) : 

Tests of Hypotheses concerning proportion(s) One-tailed tests concerning single proportion Two-tailed tests concerning single proportion One-tailed tests concerning two proportions Two-tailed tests concerning two proportions

Tests of Hypotheses concerning Variance(s) : 

Tests of Hypotheses concerning Variance(s) One-tailed chi-square test concerning single population variance Two-tailed chi-square test concerning single population variance One-tailed F-test concerning equality of two population variances Two-tailed F-test concerning equality of two population variances

Chi-square test for checking independence of two categorized data : 

Chi-square test for checking independence of two categorized data Let us consider two factors which may or may not have influence on the observed frequencies formed with respect to combinations of different levels of the two factors H0: Factor A and factor B are independent H1: Factor A and factor B are not independent Objective : To check whether the null hypothesis is to be accepted based on the value of the chi-square by placing the significance level of ? at the right tail of the chi-square distribution.

Chi-square test for goodness of fit : 

Chi-square test for goodness of fit To fit the data to the nearest distribution which represents the data more meaningfully for future analysis. Such fitting of data to the nearest distribution is done using the goodness of fit test H0: The given data follow an assumed distribution H1: The given data do not follow an assumed distribution Objective : To check whether the null hypothesis is to be accepted based on the value of the chi-square by placing the significance level of ? at the right tail of the chi-square distribution.

Comparing Multiple Population : 

Comparing Multiple Population Comparing multiple population variances Comparing multiple population means

Comparing multiple population variances : 

Comparing multiple population variances For more than two populations, it is assumed that the probability distribution ( i.e. Histogram ) of each population is approximately normal. H0: All the population variances are equals H1: At least two population variances are differ This test is called Bartlett’s Test Objective : To check whether the null hypothesis is to be accepted based on the value of the chi-square by placing the significance level of ? at the right tail of the chi-square distribution.

Comparing multiple population means : 

Comparing multiple population means For more than two populations, it is assumed that the probability distribution ( i.e. Histogram ) of each population is approximately normal. H0: All the population means are equals H1: At least two population means are differ This test is called Analysis Of Variance (ANOVA) Data from Unrestricted (independent) samples ( One-way ANOVA) Data from Block Restricted Samples (Two-way ANOVA) Objective : To check whether the null hypothesis is to be accepted based on the value of the F by placing the significance level of ? at the right tail of the Snedecor F distribution.

Example for One Way ANOVA : 

Example for One Way ANOVA School I : 45, 54, 35, 43, 48 School II : 54, 65, 67, 55, 52 School III : 87, 65, 75, 79, 67

Non-Parametric Tests : 

Non-Parametric Tests In some situations, the practical data may be non-normal and/or it may not be possible to estimate the parameter(s) of the data The test which are used for such situations are called non-parametric tests Since these tests are based on the data which are free from distribution and parameter, these tests are known as non-parametric(NP) test or Distribution Free tests NP test can be used even for nominal data (qualitative data like greater or less, etc.) and ordinal data, like ranked data. NP test required less calculation, because there is no need to compute parameters.

List of Non-Parametric Tests : 

List of Non-Parametric Tests One-sample test One sample sign test Chi-square one sample test Kolmogorov-Smirnov test Two related samples tests Two samples sign test Wilcoxon Matched-pairs signed –rank test Two independent samples test Chi-Square test for two independent samples Mann-Whitney U test Kolmogorov-Smirnov two sample test

List of Non-Parametric Tests : 

List of Non-Parametric Tests K Related Samples test Friedman Two way Analysis of Variance by Ranks The Coehran Q test 5. K Independent samples Chi-Square test for k Independent samples The extension of the Median test Kruskal-Wallis one-way Analysis of Variance by Rank

One sample sign test : 

One sample sign test This test is applied to a situation where a sample is taken from a population which has a continuous symmetrical distribution and known to be non-normal such that the probability of having a sample values less than the mean value as well as probability of having a sample values more than the mean value(p) is ½. Classified into four categories One-tailed one-sample sign tests for small sample Two-tailed one-sample sign tests for small sample One-tailed one-sample sign tests for large sample Two-tailed one-sample sign tests for large sample

Kolmogorov-smirnov test : 

Kolmogorov-smirnov test It is similar to the chi-square test to do goodness of fit of a given set of data to an assumed distribution This test is more powerful for small samples whereas the chi- square test is suited for large sample H0: The given data follow an assumed distribution H1: The given data do not follow an assumed distribution K-S test is an one-tailed test. Hence if the calculated value of D is more than the theoretical value of D for a given significance level, then reject H0 ; otherwise accept H0

Two samples sign test : 

Two samples sign test Two samples sign test is applied to a situation, where two samples are taken from two populations which have continuous symmetrical distributions and known to be non-normal Modified sample value, Zi = + if Xi > Yi = - if Xi < Yi = 0 if Xi = Yi Classified into four categories One-tailed two-sample sign tests with binomial distribution Two-tailed two-sample sign tests with binomial distribution One-tailed two-sample sign tests with normal distribution Two-tailed two-sample sign tests with normal distribution

The Wilcoxon Matched-pairs signed-ranks test : 

The Wilcoxon Matched-pairs signed-ranks test The Wilcoxon test is a most useful test for behavioral scientist Let di = the difference score for any matched pair Rank all the di without regard to sign T = Sum of rank with less frequent sign Compute Z = [T – E(T)]/SD(T)

Mann-Whitney U Test : 

Mann-Whitney U Test Mann-Whitney U test is an alternate to the two sample t-test This test is based on the ranks of the observations of two samples put together Alternate name for this test is Rank-Sum Test Let R1 = The sum of the ranks of the observations of the first sample Let R2 = The sum of the ranks of the observations of the second sample Objective: To check whether the two samples are drawn from different populations having the same distribution Compute Z = [U – E(U)]/SD(U) where U = n1n2 + [n1(n1 + 1)/2] - R1 or U = n1n2 + [n2(n2 + 1)/2] - R2

Correlation and Regression Analysis : 

Correlation and Regression Analysis The Chi-square test measures the association between two or more variables.This test is applicable only when data is on nominal scale. Correlation and Regression analysis is used for measuring the relationship between two variables measured on interval or ratio scale.

Correlation Analysis : 

Correlation Analysis Correlation analysis is a statistical technique used to measure the magnitude of linear relationship between two variables. Correlation analysis cannot be used in isolation to describe the relationship between variables. It can be used along with regression analysis to determine the nature of the relationship between two variables. Thus correlation analysis can be used for further analysis Two prominent types of correlation Coefficient are Pearson Product Moment correlation coefficient Spearman’s Rank correlation coefficient Testing the significance of correlation coefficient Type I H0: ? = 0 and H1: ? ? 0 Type II H0: ? = r and H1: ? ? r Type III H0: r1 = r2 and H1: r1 ? r2

Correlation Analysis : 

Correlation Analysis Example: Mark in Mathematics: 89,58,78,79,86,58 Marks in Statistics: 75,79,59,78,84,65

Regression Analysis : 

Regression Analysis Regression analysis is used to predict the nature and closeness of relationships between two or more variables It evaluate the causal effect of one variable on another variable It used to predict the variability in the dependent (or criterion) variable based on the information about one or more independent (or predictor) variables. Two variables : Simple or Linear Regression Analysis More than two variables : Multiple Regression Analysis

Linear Regression Analysis : 

Linear Regression Analysis Linear regression : Y = ? + ?X Where Y : Dependent variable X : Independent variable ? and ? : Two constants are called regression coefficients ? : Slope coefficient i.e. the change in the value of Y with the corresponding change in one unit of X ? : Y intercept when X = 0 R2 : The strength of association i.e. to what degree that the variation in Y can be explained by X. R2 = 0.10 then only 10% of the total variation in Y can be explained by the variation in X variables

Test of significance of Regression Equation : 

Test of significance of Regression Equation Linear regression : Y = ? + ?X F test is used to test the significance of the linear relationship between two variables Y and X H0: ? = 0 (There is no linear relationship between Y and X) H1: ? ? 0 (There is linear relationship between Y and X) Objective : To check whether the estimates from the regression model represent the real world data.

Example for Regression Analysis : 

Example for Regression Analysis School Climate : 25, 34, 55, 45, 56, 49, 65 Academic Achievement: 58, 62, 80, 75, 84, 72, 89

Multivariate Analysis : 

Multivariate Analysis Multivariate analysis is defined as “ all statistical techniques which are simultaneously analyse more than two variables on a sample of observation”. Multivariate analysis helps the researcher in evaluating the relationship between multiple (more than two) variables simultaneously. Multivariate techniques are broadly classified into two categories: Dependency Techniques Independency Techniques

A Classification of Multivariate Methods : 

A Classification of Multivariate Methods

Multivariate Analysis: Classification of Dependence Methods : 

Multivariate Analysis: Classification of Dependence Methods

Multivariate Analysis: Classification of Independence Methods : 

Multivariate Analysis: Classification of Independence Methods

Discriminant Analysis : 

Discriminant Analysis Discriminant analysis aims at studying the effect of two or more predictor variables (independent variables) on certain evaluation criterion The evaluation criterion may be two or more groups Two groups such as good or bad, like or dislike, successful or unsuccessful, above expected level or below expected level Three groups such as good, normal or poor Check whether the predictor variable discriminate among the groups To identify the predictor variable which is more important when compared to other predictor variable(s). Such analysis is called discriminant analysis

Discriminant Analysis : 

Discriminant Analysis Designing a discriminant function: Y = aX1 + bX2 where Y is a linear composite representing the discriminant function, X1 and X2 are the predictor variables (independent variables) which are having effect on the evaluation criterion of the problem of interest. Finding the discriminant ratio (K) and determining the variables which account for intergroup difference in terms of group means This ratio is the maximum possible ratio between the ‘variability between groups’ and the ‘variability within groups’ Finding the critical value which can be used to include a new data set (i.e. new combination of instances for the predictor variables) into its appropriate group Testing H0: The group means are equal in importance H1: The group means are not equal in importance using F test at a given significance level ?

Factor Analysis : 

Factor Analysis Factor analysis can be defined as a ‘set of methods in which the observable or manifest responses of individuals on a set of variables are represented as functions of a small number of latent variables called factors’. Factor analysis helps the researcher to reduce the number of variables to be analyzed, thereby making the analysis easier. For example, Consider a market researcher at a credit card company who wants to evaluate the credit card usage and behaviour of customers, using various variables. The variables include age, gender, marital status, income level, education, employment status, credit history and family background. Analysis based on a wide range of variables can be tedious and time consuming. Using Factor Analysis, the researcher can reduce the large number of variables into a few dimensions called factors that summarize the available data. Its aims at grouping the original input variables into factors which underlying the input variables. For example, age, gender, marital status can be combined under a factor called demographic characteristics. The income level, education, employment status can be combined under a factor called socio-economic status. The credit card and family background can be combined under factor called background status.

Benefits of Factor Analysis : 

Benefits of Factor Analysis To identify the hidden dimensions or construct which may not be apparent from direct analysis To identify relationships between variables It helps in data reduction It helps the researcher to cluster the product and population being analyzed.

Terminology in Factor Analysis : 

Terminology in Factor Analysis Factor: A factor is an underlying construct or dimension that represent a set of observed variables. In the credit card company example, the demographic characteristics, socio economic status and background status represent a set of variables. Factor Loadings: Factor loading help in interpreting and labeling the factors. It measure how closely the variables in the factor are associated. It is also called factor-variable correlation. Factor loadings are correlation coefficients between the variables and the factors. Eigen Values: Eigen values measure the variance in all the variables corresponding to the factor. Eigen values are calculated by adding the squares of factor loading of all the variables in the factor. It aid in explaining the importance of the factor with respect to variables.Generally factors with eigen values more than 1.0 are considered stable. The factors that have low eigen values (<1.0) may not explain the variance in the variables related to that factor.

Terminology in Factor Analysis : 

Terminology in Factor Analysis Communalities: Communalities, denoted by h2, measure the percentage of variance in each variable explained by the factors extracted. It ranges from 0 to 1. A high communality value indicates that the maximum amount of the variance in the variable is explained by the factors extracted from the factor analysis. Total Variance explained: The total variance explained is the percentage of total variance of the variables explained. This is calculating by adding all the communality values of each variable and dividing it by the number of variables. Factor Variance explained: The factor variance explained is the percentage of total variance of the variables explained by the factors. This is calculating by adding the squared factor loadings of all the variables and dividing it by the number of variables.

Procedure followed for Factor Analysis : 

Procedure followed for Factor Analysis Define the problem Construct the correlation matrix that measures the relationship between the factors and the variables. Select an appropriate factor analysis method Determine the number of factors Rotation of factors Interpret the factors Determine the factor scores

Cluster Analysis : 

Cluster Analysis Cluster analysis can be defined as a set of techniques used to classify the objects into relatively homogeneous groups called clusters It involves identifying similar objects and grouping them under homogeneous groups Cluster as a group of objects that display high correlation with each other and low correlation with other variables in other clusters

Procedure in Cluster Analysis : 

Procedure in Cluster Analysis Defining the problem: First define the problem and de upon the variables based on which the objects are clustered. Selection of similarity or distance measures: The similarity measure tries to examine the proximity between the objects. Closer or similar objects are grouped together and the farther objects are ignored. There are three major methods to measure the similarity between objects: Euclidean Distance measures Correlation coefficient Association coefficients Selection of clustering approach: To select the appropriate clustering approach. There are two types of clustering approaches: Hierarchical Clustering approach Non-Hierarchical Clustering approach Hierarchical clustering Approach consists of either a top-down approach or a bottom-up approach. Prominent hierarchical clustering methods are: Single linkage, Complete linkage, Average linkage, Ward’s method and Centroid method.

Procedure in Cluster Analysis : 

Procedure in Cluster Analysis Hierarchical clustering Approach consists of either a top-down approach or a bottom-up approach. Prominent hierarchical clustering methods are: Single linkage, Complete linkage, Average linkage, Ward’s method and Centroid method. Non-Hierarchical clustering Approach: A cluster center is first determined and all the objects that are within the specified distance from the cluster center are included in the cluster Deciding on the number of clusters to be selected 5 Interpreting the clusters

Canonical Correlation Analysis (CCA) : 

Canonical Correlation Analysis (CCA) CCA is a way of measuring the linear relationship between two multidimensional variables. CCA is extension of multiple regression analysis (MRA) MRA analyses the linear relationship between a single dependent variable and multiple independent variables. CCA analyses a linear relationship between multiple dependent variable and multiple independent variables. For example, a social researcher wants to know the relationship between various work environment factors (like work culture, HR policies, Compensation structure, top management) influencing various employee behaviour elements (Employee productivity, job satisfaction, perception about company) The linear combination for each variable is called canonical variables or canonical variates. CCA tries to maximize the correlation between two canonical variables

Canonical Correlation Analysis (CCA) : 

Canonical Correlation Analysis (CCA) For example, U represent the linear combination of work environment factors U = a1X1 + a2X2 + a3X3 + a4X4 and V represent the linear combination of employee behaviour factors V = b1Y1 + b2Y2 + b3Y3 + b4Y4 The coefficient of each canonical variable are called canonical coefficients To interpret the canonical analysis, the researcher examines the relative magnitude and the sign of the several weights defining each equation and sees if a meaningful interpretation can be given. Being a complex statistical tool that requires a great investment of effort and computing resources, CCA has not gained as much popularity as statistical tools like multiple regression.

Multivariate Analysis of Variance (MANOVA) : 

Multivariate Analysis of Variance (MANOVA) MANOVA examines the relationship between several dependent variables and several independent variables It tries to examine whether there is any difference between various dependent variables with respect to the independent variables. For example, an industrial buyer wants to know whether the product from Company A, Company B and Company C differ in terms of various parameters (set by the company) such as quality, customer support, pricing and reliability. The difference between ANOVA and MANOVA is that while ANOVA deals with problems containing one dependent variable and several independent variables, MANOVA deals with problems containing several dependent variables and several independent variables. Another major difference is that the ANOVA test ignores interrelationship between the variables. This leads to biased results MANOVA considers this aspect by testing the mean difference between groups on two or more dependent variables simultaneously.

Books for Reference : 

Books for Reference SPSS For Windows Step by Step A simple Guide and Reference Sixth Edition Darren George and Paul Mallery Pearson Education 48, Ariya Gowda Road, West Mambalam, Chennai Phone: 24803091, 92, 93, 94

Books for Reference : 

Books for Reference Statistics: Concepts and Applications Nabendu Pal and Sahadeb Sarkar Prentice-Hall of India Private Limited, New Delhi. Marketing Research – Text and cases Rajendra Nargundkar Tata McGraw-Hill Publishing Company Limited, New Delhi Research Methodology Panneerselvam.R Prentice-Hall of India Private Limited, New Delhi.

authorStream Live Help