Slide1 : Factor Analysis Theory and Practice Choose the number of factors
Evaluate a factor model
Interpret factors (rotation)
Evaluate the stability of the factor model
Obtain and analyze factors’ plots. Comment the position of obs, possibly also taking into account categorical grouping variables.
Slide2 : The data we will consider Example1. Variables related to the attractiveness of EU Regions (attention limited to 5 countries) We are interested in analyzing the attractiveness of European Regions. This latent concept is supposed to be related to the above variables. GDP_PER_CAPITA: Per capita Domestic Gross Product
DENSITY: Nr of inhabitants / area of the region (km2)
ROAD_AREA: Km of roads (divided by the area of the region)
STUDENT_POP_5_29: students (% of population aged 5-29)
ISCED3_STUDENTS: student at the 2° educational level (% of students)
SATURATION_MKT: size of local market compared to the potential market (%). REAL_G_RATE_GDP: Growth rate of GDP
GFCF_GDP: Gross fixed capital formation - net new investment by enterprises in the domestic economy in fixed capital (as a % of GDP)
RD_GDP: R&D expenditures (as a % of GDP)
PAT_RD: Nr of patents (divided by the R&D expenditures)
DENS_LOCAL_UNIT: Nr of local enterprises (divided by the area of the region)
WAGES_GDP: Wages in the manufactory sector (as a % of GDP)
EMPL_LOCAL_UNITS: Nr of employees working in local enterprises (average)
UNEMPL_RATE: Unemployment Rate (calculated on population with age > 15)
Factor Analysis, Preliminary Analysis : Preliminary analysis: Identify outliers (p-value < 0.05) and assign them a null weight
In this way, we obtain factors without taking into account outliers. Nevertheless, factors scores can be obtained also for outliers. Factor Analysis, Preliminary Analysis
Factor Analysis, Principal Components Method : Factor Analysis, Principal Components Method Using the standard eigenvalue > 1 rule, we should select 4 factors.
Nevertheless, in factor analysis the number of factors should be estimated.
In particular, the standard rule may suggest a too high number of factors to be retained in the analysis. Actually this rule is related to the ability of the PC’s to explain the observed variables.
Nevertheless, in factor analysis, we are prepared to observe variables which are weakly related to the latent factors.
FA, Choosing the number of factors – Scree plot : FA, Choosing the number of factors – Scree plot SCREE PLOT: It is based upon the consideration that the variance of the factors levels off when the factors are mainly measuring random error (technical factors versus meaningful ones). The scree plot is a plot of the eigenvalues against their factor numbers aiming at determining this “levelling off”. r r The scree plot typically shows a distinct break between the steep slope of the larger factors and the gradual trailing off of the rest of the factors. The typical shape is therefore similar to a hockey stick. The name “scree” comes from the resemblance of such a plot to scree of rock debris at the foot of a mountain.
The nr of factors selected according to this criterion corresponds to the eigenvalue number to the immediate left of the beginning of the scree called the elbow (some authors consider also the elbow point). This criterion may be inadequate when there are not predominant factors
Nevertheless, if there is an “elbow” the hypothesis of underlying latent factors is plausible. If the eigenvalues decrease gradually this means that the factors are only transformations of the variables but do not suggest the existence of a latent, strong structure
FA, Choosing the number of factors – Scree plot : FA, Choosing the number of factors – Scree plot In our example, we have an indication of 3 or 4 factors, depending on whether we take into account the elbow point or not.
FA, Choosing the number of factors – Broken Stick model : FA, Choosing the number of factors – Broken Stick model BROKEN STICK MODEL: Suppose we have a stick of length 1 which is broken at random into p pieces (p is the nr of manifest variables). It can be shown that the expected length of the k-th longest segment is: We are interested in reproducing the trace of R. With factor analysis we are splitting the trace into p pieces, the eigenvalues. If the factors (PCs) are significant, they should explain a proportion of the trace higher than the casual one.
Hence, if the proportion of total variance accounted for by the k-th factor is higher than the expected length of the k-th broken stick segment, then that component should be retained.
In the excel file broken stick model.xls the proportions under the randomness hypothesis (broken sticks proportions) and the eigenvalues under the randomness hypothesis (broken sticks lengths) are reported for a number of manifest variables, p, between 2 and 60. The observed proportions and eigenvalues should be higher.
FA, Choosing the number of factors – Broken Stick model : FA, Choosing the number of factors – Broken Stick model This rule suggests the existence of 2 factors (or 3 factors if we consider the difference between the observed and random proportion low enough)
(see later, we will return on this when we will talk about rotation).
FA, % of variance explained by different models : FA, % of variance explained by different models The factor model with 2 factors is leaving unexplained 4 variables, the model with 3 factors leaves unexplained 2 variables. One variable evidently resist to aggregation. This means that it is only partially related to the others
FA, Choosing the number of factors : FA, Choosing the number of factors The number of factors to consider is not clearly indicated by the different rules we are taking into account. Actually, these rules are only suggesting some guidelines to properly choose the number of factors.
Remember: The aim of factor analysis is not to explain the variances of the manifest variables. In fact, it may happen that some manifest variables are not related to the latent factors. Some manifest vars may be not related in the supposed way to the others or to the latent concepts we are trying to describe. In this sense, we should consider a small number of factors, and we should be concerned with their meaningfulness and their ability to explain correlations. With reference to our example, we should analyze the 2/3 factors solutions
Nevertheless, sometimes factor analysis is used as an alternative to PCA to obtain some interpretable syntheses of the manifest variables. In this case, we will be interested in extracting factors by referring to the eigenvalue > 1 rule or, also, to the proportion of variance explained globally or for each variable. In this situation, probably the number of factors taken into account will be higher than that extracted in the case when we are estimating a factor model. With reference to our example, we should analyze the 4/5/6 factors solutions and, also, we could increase the number of factors to explain all the vars in a satisfactory way.
2/3 Factors model – Loadings of the first solution (PCs) : 2/3 Factors model – Loadings of the first solution (PCs) Factor Pattern 2 factors Factor Pattern 3 factors The factors extracted with the PC method are the standardized PC’s. The loadings are the correlations between manifest vars and PC’s. Notice that the PC’s loadings do not change with the nr of extracted PC’s. The % of explained variance presented before gives us information about the quality of the explanation of the manifest variables. We know (slide 9) that some vars are not well explained by the 2 or 3 factors model.
Besides the explanatory power, in factor analysis we are mainly concerned with the interpretation of factors. We should consider if the model is meaningful, i.e., if it leads to the identification of interpretable latent structures
Rotation of factors – Motivation : Rotation of factors – Motivation The PC’s are extracted in a decreasing order of importance. Thus, the 1° PC, the most important, usually is a general factor , i.e., it is correlated with many vars and hence is difficult to interpret.
Instead the factors are not ordered according to their importance. This means that the first factors do not need to be characterized by higher explanatory power (variance).
Moreover, a rotation of the initial solution does not affect the validity of a model. This characteristic of the factor model is often used to obtain a solution which is possibly easier to interpret than the first one.
To ease the interpretation of the factors, i.e., the individuation of the latent concepts, the solution is rotated in such a way that each variable is related to as few factors as possible or that each factor is related to as few vars as possible. In this way the manifest variables can be grouped according to the factor they are correlated to.
If the PC method is used to extract factors, the initial solution gives us the standardized PC’s. If we rotate this solution, the new solution is not related to PC’s.
Rotation of factors – Reasoning : Factor 1 Factor 2
x1 0.5 0.5
x2 0.8 0.8
x3 -0.7 0.7
x4 -0.5 -0.5 Factor 1 Factor 2
x1 0 0.6
x2 0 0.9
x3 -0.9 0
x4 0 -0.9 Loadings of the initial factors Loadings of the rotated factors Rotation of factors – Reasoning Correlation Plot
Rotation of factors : Orthogonal rotations Loadings: correlations between manifest variables and the initial solution Loadings of the rotated factors Rotation of factors Varimax Criterion
Minimizes the number of manifest vars having high correlation with a given factor. For each given column (factor) we look for a solution having some loadings close to 1, and the other close to 0 Quartimax Criterion
Minimizes the nr of factors having high correlation with a given variable. Same reasoning as Varimax but by row
Rotation of factors : We will usually refer to the Varimax criterion since it eases the interpretation of factors.
The rotation of factors
UNCHANGED: explanatory power of the factor model remains the same (% of explained variance, reconstruction of the correlation matrix – see later – and so on).
CHANGED: variance explained by factors (the % of explained variance is redistributed along the factors, so the first factor is “less important” and the other factors are more important than before. Of course, also the loadings change.
IMPORTANT:
The loadings of the rotated factors are different depending on the number of factors initially extracted. This means that the rotated factors change with the number of factors (is this the same with principal components?) Rotation of factors
2 Factors model – Rotation : 2 Factors model – Rotation Factor Pattern – Initial solution Rotated Factor Pattern The sum of the explained variance (coinciding with the sum of squared loadings) does not change (it coincides with the % in slide 9). The unique difference is that the second factor now explains a slightly higher % of variance and the first one a lower %.
3 Factors model – Rotation : 3 Factors model – Rotation Rotated Factor Pattern Observe that the first 2 rotated factors do not coincide with the first two rotated factors of the 2 factors model. Notice that after the rotation the variances explained by the 3 factors are significant also according to the broken stick model.
By adding one factor we have a general improvement of the % of the explained variances and, also, less vars left unexplained. We decide to consider 3 factors.
If interested in using factor analysis as a reduction technique, we should increase the nr of factors, in order to improve the performance of the factor model wrt all the manifest vars. Variance Explained by Factors Unrotated solution
Rotated solution
Broken stick
3 Factors model – Robustness to extraction method : 3 Factors model – Robustness to extraction method Now we can be interested in evaluating if the estimated factor model is stable or depends upon the criterion (principal components method/ principal factors method) selected to extract factors. Let us extract factors using PF method (priors estimate of communality = smc, squared multiple correlation coefficient) Observe that Saturation_mkt, real_g_rate_gdp are characterized by a very low coefficient/ communality estimate. It is worth to consider again the proportion of variances explained by the 2/3 factors models.
Consider the % of variance explained by the factors extracted with the PC method. 2 factors model: leaves unexplained some vars (dens_local_unit) which are related to the others. This makes it sensible our 3 factors model choice.
3 Factors model – Robustness to extraction method : 3 Factors model – Robustness to extraction method Number of factors
The reduced correlation matrix has not the properties of the true correlation matrix. Hence, there are usually some negative eigenvalues. In this situations, the number of factors is selected by considering:
1. Factors with eigenvalues higher than the average (criterion similar to the eigenvalue>1 rule). In this case, 4.
2. The nr of factors explaining the 100% of the variance. In this case, 6.
3. Factors with positive eigenvalues, in this case 8.
These criteria usually lead to the selection of a too large number of factors.
Thus, it is often preferred to choose the number of factors estimated by referring to criteria suggested for the factors extracted with the PCs method (3 factors)
3 Factors model – % of variance explained - Comparison : 3 Factors model – % of variance explained - Comparison Observe moreover that for some variables (those relative to students) the explanation is better whilst for all the other it is worse. Notice also that the critical variables, saturation_mkt, rd_gdp and real_g_rate_gdp the performance is really worse. Remember that these vars last were characterized by low initial estimates of communalities.
The reason for these results is that the principal factor model is less concerned with the explanation of variances (as is the principal components method) and more with the explanation of correlations. Thus, the isolated variables remain often unexplained, due to the low importance of their correlations with the other variables. Observe that the % of explained variance is lower for the factors extracted with the principal factors model.
3 Factors model – Rotated factors patterns - Comparison : Principal components method 3 Factors model – Rotated factors patterns - Comparison Principal factors method From the loadings you can appreciate the stability of the factor model. The strongest loadings remain unchanged. Instead, some variables are less correlated to the factors. These variables are the isolated/critical ones. Actually, as we said before, these variables are weakly correlated to the others and the principal factors method emphasizes this aspect more than the principal components method. We will now proceed with the factors obtained with the PCs method, but will present in the next slides also some results about the performance of the principal factors method to highlight the main features of the two approaches.
3 Factors model – PCs method – The correlation matrix : 3 Factors model – PCs method – The correlation matrix Remember that the factor model attempts at reproducing at best the observed correlation matrix. More precisely, we have
Observed correlation matrix R
Estimated correlation matrix On the diagonal of the estimated correlation matrix, we find the estimated communalities = 1 – uniqueness.
To evaluate the performance of a factor model we considered the % of explained variance (the reconstruction of the trace/communalities). We now consider the matrix of residual correlations: On the diagonal of this matrix we have the uniqueness, the portion of unexplained variance. The off-diagonal elements of the matrix are the residual correlations
3 Factors model – PCs method – Residual correlation matrix : 3 Factors model – PCs method – Residual correlation matrix Residual Correlations With Uniqueness on the Diagonal A residual > 0 indicates that the correlation between two variables is under-estimated by the factor model. Instead, a residual lower than 0 indicates over-estimation.
For example, Saturation_mkt is not strongly connected to the other vars. Nevertheless, it is correlated to the 3° factor, which is also related to density, density_local_unit and road area. Notice from the residual matrix that the resulting correlation coefficients between this var and the mentioned ones are overestimated. On the other side, the labour market variables (13-14), related to the 2° factor are correlated to density, but this correlation is not captured from the model.
Also observe that the two vars with high specificity have not strong correlations with the other vars.
3 Factors model – PCs method – Residual correlation matrix : 3 Factors model – PCs method – Residual correlation matrix When the number of manifest vars is high, it may be difficult to read carefully the residual matrix. To ease the analysis, it is worth to consider a synthesis of the residuals for each var. In this way, we could at least understand which are the most “problematic” vars.
The synthesis usually taken into account is the root mean square residual of the off-diagonal elements for each variable (the diagonal contains the uniqueness and is not taken into account). Of course, this synthesis does not give us information about the sign of the residuals. Notice that when using the principal factor method, the residual correlations (synthesis) decrease. Actually, usually the principal components method has a better performance with respect to the explained variance (lower diagonal elements – uniqueness – higher % of explained variance) whilst the principal factors method has a better performance wrt the reconstruction of the correlations.
3 Factors model – PCs method – Conclusions : 3 Factors model – PCs method – Conclusions From the previous analysis we can draw the following conclusions.
1. A 3 factors model seems appropriate.
2. The extracted factors are almost stable and their interpretation do not change as the extraction method changes. Nevertheless, some vars are not explained in a satisfactory way by the model. This is due to the weak relationship between these variables and the others. These variables are then useful to enrich the factor labeling/interpretation. We can consider the obtained factors, but we should be aware that we are not explaining all the variables but we are really estimating latent factors.
3. Since here we are interested in estimating/comparing attractiveness and the manifest vars are of interest only since we thought they could be related to attractiveness we go on with the 3.factors model.
BUT:
4. If we are interested in extracting syntheses explaining all the manifest vars in a satisfactory way, then we should consider a higher nr of factors. Probably, we will find in this case some factors dedicated to the explanation of the vars resisting to aggregation. In this sense, these syntheses can not be considered as “latent factors”.
5. If our intent is to consider well defined factors/syntheses, we could evaluate the opportunity to exclude the isolated variables from factor analysis. Of course, if we are interested in the information contained in these variables, we should take into account these variables with factors. This means that we can substitute to the correlated vars the factors synthesizing them and that instead it is not possible to synthesize in a suitable way the information contained in the isolated vars.
3 Factors model – PCs method – Interpretation : 3 Factors model – PCs method – Interpretation Rotated Factor Pattern 1° factor: related to education of population (human capital / potential), infrastructures, richness, RD expenditures, healthy labour market. Weakly related to growth rate of gdp and to investment in fixed capital. This is the Basic attractiveness of the region 2° factor: related to vars describing big firms, with a good RD output, and a low investment in fixed capital (innovation of product more than of process?). The factor appears to be related to the presence of mature, established firms which already passed the initial growth phase. Big established innovating firms 3° factor: related to density (wrt to population, employees, market).
Agglomeration/Saturation.
3 Factors model – PCs method – Performance wrt obs : 3 Factors model – PCs method – Performance wrt obs Factor analysis aims at identifying the latent factors underlying the considered set of manifest vars . Nevertheless, once we have estimated the factor model, we can be interested in evaluating the behaviour of the obs with respect to the factors. As it was mentioned before, for each obs we can estimate the factor scores.
As in PCA, we obtain factorial maps and analyze the position of the obs on the map.
Also in this case, it may be important to identify observations which are dominating factors and those which are not well explained and whose position on the map should not be commented. We will consider the cosines only in the case when the factors are extracted using the principal components method. We flag as critical the observation with low cumulative cosine (0.15)
We define also dominating obs for each factor on the basis of quartiles.
3 Factors model –Plots in the factors space – outliers : 3 Factors model –Plots in the factors space – outliers Preliminary plots suggest the opportunity to remove outliers (red observations) from the plots to avoid distortions. Factors 2 - 1 Factors 3 - 1
3 Factors model –Plots in the factors space – quality/cosines : 3 Factors model –Plots in the factors space – quality/cosines We can see here that UK regions are dominating the first factor (basic attractiveness) opposed to Spain and Italy (some regions) and Germany (some regions).
Instead DE is dominating the 2° factor (big established innovating firms) and is opposed to Spain and Italy (some regions) and UK (some regions)
France is not well represented on the factorial space and there are some German and Italian regions which are well represented on the 3° factor. Factors 2 - 1 Types Legend | H2 H1 ND H3 CR H1H2 H2H3 H1H3
--------------+----------------------------------------------------------------------------------------
Symbol Colors | blue red green cyan magenta orange gold lilac
3 Factors model –Plots in the factors space – quality/cosines : The 3° factor (agglomeration/saturation) emphasizes a juxtaposition between a limited number of regions. This is probably the reason why the broken stick model suggested a lower number of factors to be retained.
It is worth here to analyze more in details which are the dominating obs (here country-effect is not strong, thus we have to refer directly to observations) 3 Factors model –Plots in the factors space – quality/cosines Factors 3 - 1 Types Legend | H2 H1 ND H3 CR H1H2 H2H3 H1H3
--------------+----------------------------------------------------------------------------------------
Symbol Colors | blue red green cyan magenta orange gold lilac
3 Factors model – PCs method – The countries : 3 Factors model – PCs method – The countries Types Legend | DE ES FR IT UK
--------------+-------------------------------------------------
Symbol Colors | blue red green cyan magenta It is evident that the position of observations on the map is strictly related to the country. UK is characterized by high basic attractiveness characteristics, and is opposed, with respect to this factor, to all the other countries but especially to Spain and Italy.
Germany is characterized by the presence of big established innovating firms, and is opposed wrt this factor to all the countries, but in particular to Spain and Italy.
France is in a central position and this means that it is not strongly characterized by factors. Factors 2 - 1
3 Factors model – PCs method – The countries : 3 Factors model – PCs method – The countries Types Legend | DE ES FR IT UK
--------------+-------------------------------------------------
Symbol Colors | blue red green cyan magenta Along the 3° factor (vertical axis) there is not a so clear impact of the country. As we saw before this dimension opposes some regions of Italy/UK to some regions of the same country.
In this case, it would be completely misleading to propose a synthesis of the factors to describe attractiveness of the considered countries. Factors 3 - 1