larsen jsm2003

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Comparison of Alternative Latent Class Clusterings of Survey Data: 

Comparison of Alternative Latent Class Clusterings of Survey Data Michael D. Larsen University of Chicago/ Iowa State University

Outline: 

Outline Survey and variables Latent class models Comparing clusterings Some comparisons Conclusions and future plans

Survey: 

Survey 1997 Survey of Doctoral Recipients NSF survey every 2 years 1 of 3 surveys in SESTAT database Respondents PhDs 1990-1996 Physical (n=2216) and biological (n=1019) sciences, engineering (n=516) Work in higher educational institutions

Variables: 

Variables Demographics: Sex, Race, Ethnicity, Age, etc. %F: biology (49%), physical (33%), eng. (23%) Several sets on career preparation Limitations on career path job searches Work activities Job search resources (which used?) Adequacy of PhD program career preparation Assorted other questions (e.g., postdoc?)

One set of variables example: 

One set of variables example Adequacy of career preparation Very adequate vs. Somewhat or not adeq. 11 areas (211 table) Biology, 3 significant differences, F vs. M Communication (F>M) z= 2.73 Ethics (F>M) z= 2.48 Computer (M>F) z= -2.58

Why cluster?: 

Why cluster? Interest in clusters themselves Are there identifiable groups? Are clusters stable over time? Are the clusters related to demographic subpopulations? How do outcomes vary across clusters?

Latent Class Models: 

Latent Class Models G latent classes (subpopulations) K categorical variables define contingency table, each person in one cell of table Observed pattern of responses in table is mixture of patterns from latent classes. Response probability on each variable (conditionally) independent within each class (prob’s differ across classes).

Latent Class Models, cont.: 

Latent Class Models, cont. P(response pattern) = sum over classes of [ P(class) P(response pattern | class) ] EM algorithm (Dempster, Laird, Rubin 1977) Compute P(class | response pattern).

Comparing clusterings: 

Comparing clusterings Different sets of variables will group respondents differently. Cross tabulations Adjusted Rand Index (ARI) Rand Index = # of pairs in same cluster ARI = (Rand – Exp.)/(Max –Exp.) -- assumes hyper geometric distribution

Calibrating the ARI (or other): 

Calibrating the ARI (or other) Simulation Generate 1000 samples from the hyper geometric distribution, which corresponds to null of no association Compute ARI for 1000 samples Report # of samples >= ARIobserved

A comparison: 

A comparison Biology, Adequacy of Career Preparation Communication, ARI = 0.002, tail = 0.015 Ethics, ARI = 0.039, tail = 0.039 Computer, ARI = 0.002, tail = 0.021 4 latent classes (interesting patterns) ARI value is lower, tail area is larger

Comments: 

Comments ARI values are not large (not near 1) for tables with large n Simulated values are similar to P-values from standard tests Small ARI values can be significant in the way that small log odds (near 0) can be significant for large n Latent classes fit better than simple classifications, but ARI doesn’t increase.

More on comment 4.: 

More on comment 4. Two classes (females, males) and CI. vs. Four latent classes (based on BCI) and CI. Latter fits (much) better. ARI not larger than largest on individual variables.

Future plans: 

Future plans 1. Repeat on next waves (1999, 2001) 2. Additional comparison methods: Diversity measures Slight modification of ARI Machine Learning, Stats, Discovery, 2003, Marina Meila, U of Washington 3. Missing data (DK, RF, Missing)

References: 

References Larsen, Statistics in Transition, 2003 Larsen, submitted to “Retaining Women in Early Academic SMET Careers,” 2002, under revision Hubert and Arabie, 1985, J. of Classification NSF, EIA-0089930, ITWF

Contact Information: 

Contact Information Mike Larsen, U of Chicago, Statistics larsen@galton.uchicago.edu http://galton.uchicago.edu/~larsen/jsm03 Email for contact at Iowa State University, Statistics