STATISTICS FOR ECONOMICS AND BUSINESS: STATISTICS FOR ECONOMICS AND BUSINESS The course I loved to hate… (S.B.)
STATISTICS FOR ECONOMICS AND BUSINESS : STATISTICS FOR ECONOMICS AND BUSINESS The goals
The key aim is providing you with basic skills in multivariate data analysis. In particular, we focus on techniques useful to analyze and synthesize data sets with many variables and/or many observations.
Great attention is devoted to applications. You will learn to identify a proper multivariate technique for a given problem, to analyze the data with the statistical software SAS, to interpret results and to formulate the conclusions of the statistical analysis. We will (try to) refer to datasets relevant to your studies. We briefly present the course. A document with a more detailed description of rules and criteria has been already uploaded in Learning space
STATISTICS FOR ECONOMICS AND BUSINESS : STATISTICS FOR ECONOMICS AND BUSINESS The tools Frontal Lessons (Theory)
Power point slides on-line before each lesson
2. Lab classes (Applications)
familiarize with the statistical software SAS, interpret results.
Extended solutions on-line after each lesson
Word documents with a detailed descriptions of SAS programs
4. Tutor: Chiara Castellano CLEMIT grad student (2 times a week)
5. Discussion List (LS or specific. please avoid personal email)
5. SAS installed on your laptop (see the Library for details)
6. Textbooks (see the Library for details; my slides should be sufficient but if you are not present at the lessons, reference to textbook is recommended)
STATISTICS FOR ECONOMICS AND BUSINESS : STATISTICS FOR ECONOMICS AND BUSINESS Changes and Enhancements Introduction of graded assignments/group works
Some students experienced problems due to the postponement of study. Many students asked for (graded) incentives to day by day study. An assessment methods specific for attending students has been introduced and is strongly recommended.
2 Some variations in the organization of the lessons
STATISTICS FOR ECONOMICS AND BUSINESS : STATISTICS FOR ECONOMICS AND BUSINESS Assessment Methods For attending students the course grade is based on:
The analysis of a real data set (Pc-lab session – 4 hours). Here the focus is on the proper use of statistical techniques and adequacy of economic conclusions drawn on the basis of the obtained results. Documents with SAS procedures can be used during the exam (no other material is allowed).
A written exam concerning the methodological issues discussed during the course (content of the theoretical slides).
The two exams will be graded separately (max grades = 21 and 6 respectively)
2 Assignments– group work
Lessons (at least 2) dedicated to discussion of the 2 assignments. All groups members present at discussion. In these lessons one person picked at random for each group will illustrate (part of) the obtained results (material may be consulted). If the group-person answer reasonably, the assignment of the group will be graded (0-2 for each assignment). Otherwise, 0. for all group members.
Not attending students (did not hand in both assignments): extended practical and theoretical exams (max grades=23 and 8 respectively)
STATISTICS FOR ECONOMICS AND BUSINESS : STATISTICS FOR ECONOMICS AND BUSINESS Prerequisites Univariate Descriptive Statistics. Synthesis Measures (mean, median, quartiles, percentiles, variance, standard deviation). Graphical tools (histogram, box-plot). Extreme values
2. Bivariate Descriptive Statistics. Contingency table, joint, marginal and conditional distributions, measures of association. Conditional means and variances. Scatterplots, covariance, correlation coefficient
Inference: random sample, estimators (point and interval) of the mean and of the variance. Hypothesis testing: notion of p-value.
Multivariate Data Analysis: Multivariate Data Analysis Techniques to analyze/synthesize data sets with many variables and/or many observations.
MOTIVATION
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Example1. Innovation and Research in Europe (Source: Eurostat)
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Example1 (continued). Innovation and Research in Europe.
For the sake of simplicity, we limit attention to few observations and to few variables, transformed so that variables have all the same unit of measurement (we will show later how we obtain this result) How can we study the relationships among all the variables to understand which are the main tendencies of data, i.e. if there are groups of variables acting in the same or in the opposite direction?
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation 2) Obtain a line plot for VARIABLES Example1 (continued). Innovation and Research in Europe (subset)
How can we study the relationships among all the variables? A line is associated to each variable. We can observe groups of vars with similar tendencies with respect to some variables, for example the orange-red ones, or the green ones or the blue ones. These three groups of vars show different tendencies
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Example1 (continued). Innovation and Research in Europe. (subset) How can we combine the information provided by all the vars to compare innovation/ research performance for each country? Should we consider the means for the previously observed groups OF VARIABLES?
Are they sufficient to explain ALL the vars?
Should we consider the 3 means, one for each group and compare obs on the basis of them? Which is the most important index/mean? Should the 3 indices have the same weight when comparing variables? What if we want a single index? Is it possible, how much information we loose? Group 1: GERD, GERD_industry, Internet_Acc, EPO, Educ_Exp, E_gov_avail
Group 2: ST_grad, HT_Exports
Group 3: GERD_govern
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Things become complicated when we consider more vars/obs. FINDING GROUPS OF VARIABLES WITH SIMILAR PATTERN IS DIFFICULT Example1 (continued). Innovation and Research in Europe.
How can we study the relationships among all the variables?
Multivariate Data Analysis – Motivation - Vars : Multivariate Data Analysis – Motivation - Vars High number of (numerical) variables: Analyzing the relationships among variables
Synthesizing the variables Principal Component Analysis
Factor Analysis
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Example1 (continued). Innovation and Research in Europe. (subset) How can we describe the main tendencies of European countries with respect to innovation? Are there countries with similar characteristics? Which are the main pattern/profiles in this data set? Obtain a line plot FOR OBSERVATIONS A line is associated to each observation. We can observe groups of obs with similar tendencies (for example the orange-red ones). Tendencies are similar only with respect to some vars. Which vars should be mostly considered? Who is “close” to who? How can we describe in a simple way similarity or dissimilarity between countries?
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Sometimes the grouping is obtained on the basis of a priori knowledge. In this case, for example, we can group by referring to the region Example1 (continued). Innovation and Research in Europe (subset)
How can we individuate groups of cases (countries) with similar characteristics? Grouping obs according to the region is not a good idea: countries in the same region show different patterns.
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Example1 (continued). Innovation and Research in Europe.
How can we describe the main tendencies of European countries wrt innovation? Things become complicated when we consider more vars/obs. FINDING GROUPS OF OBSERVATIONS WITH SIMILAR PATTERNS IS DIFFICULT
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Describing cases
Analysis of similarity/dissimilarity between cases
Individuation of the main tendencies (groups of cases) in a data base High number of observations (either numerical or categorical) Finding groups
Cluster Analysis
Visualizing differences
Factor Analysis/Multidimensional Scaling
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Example 2. Information about projects financed by EU in 1995-1996
Multivariate Data Analysis: Multivariate Data Analysis Example 2 (continued). Projects financed by EU in 1995-1996 (partial input) Is there an association between the country, the type of organization and the topic? Are there organizations/countries specialized in particular topics?
If there is association, what is it due to? Who is attracted by what?
Multivariate Data Analysis – Motivation : Multivariate Data Analysis – Motivation Describing of the association between categorical variables, i.e., understanding the main attraction/repulsion forces between categories
Individuation of profiles of categories (i.e., typical combinations of categories Categorical Variables (two or more) with many values Correspondence Analysis
Simple and Multiple
Multivariate Data Analysis: Multivariate Data Analysis When dealing with many vars and/or obs it may be difficult to
Describe, analyze synthesize obs taking into account all the vars, individuating “typical” cases or tendencies in OBS
Study the relationships among vars and/or synthesize them jointly
Grouping of vars and/or obs according to some “natural” or somehow “intuitive” rules (e.g., the mean for the variables, the region or the richness for countries a.s.o.)
These approaches: Are subjective
Reproduce what we already know about data and do not help in further knowledge about them
Sometimes can not be applied (no natural grouping available) / difficulty in individuating similar patterns
Multivariate Data Analysis: Multivariate Data Analysis The aim of Multivariate Statistical Techniques is to
Extract information contained in a given data set, by simplifying and summarizing observations and/or variables by using
DATA DRIVEN TOOLS
The tool – i.e., the compression/simplification/synthesis of data – used to make information available depends upon the aim of the analysis and on the nature of the variables taken into account