logging in or signing up conf faking stark Shariyar Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 120 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: August 02, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ) Fritz Drasgow (UIUC) Overview: Overview 'Problems' with current personality assessment procedures The case for ideal point response process assumptions in personality Ideal point IRT models for single statement and pairwse preference items Score comparability study Personality Scale Construction Today: Personality Scale Construction Today Rooted in Classical Test Theory (CTT) and Common Factor Theory (CFT) Uses single stimulus format, fixed length scales and total scores in all analyses and interpretations Existing inventories Are static Contain a large number of relatively short scales Problem # 1: Problem # 1 Current scales worked well for research purposes, where the interest is to 'understand the relationship' between constructs But, these measures are not well-suited for adaptive formats or feedback purposes Item parameters are scale dependent Item difficulties do not directly correspond to item content, because of reverse scoring Scales are too short to have good precision More flexible test construction technology is needed Problem # 2: Problem # 2 CTT and CFT make dominance response process assumption This has been 'adopted' from cognitive ability testing To satisfy constraints of the dominance assumption Reverse scoring of negative items is introduced Neutral or extreme items are deleted from items pools because they have low item-total correlations (loadings) This results in depleted item pools and scales with properties more suitable for scholarship exams Slide6: Person endorses item if her standing on the latent trait, theta, is more extreme than that of the item. Only appropriate for moderately positive/negative items (e.g., 'I like/dislike parties') Person Dominance Response Process and Personality Items (MBR, 2001; JAP, 2006) Slide7: Person endorses item if her standing on the latent trait, theta, is near that of the item. 'My social skills are about average.' Disagree either because: Too introverted (uncomfortable talking to people) Too extraverted (great skills) Ideal Point Process: A More Flexible Alternative? Ideal Point Process and Personality (JAP, 2006; Psych Assessment, in press): Ideal Point Process and Personality (JAP, 2006; Psych Assessment, in press) Ideal point IRT models provided better fit to a wider variety of personality items than dominance IRT models Many nonmonotonic, but highly discriminating items have been found 30% more items were retained in item pools More items are available for scale construction Conclusions and Further Basic Research: Conclusions and Further Basic Research Ideal point process offers numerous advantages for improving current measures More research is needed Only few ideal point models are available; more flexibility is needed Item and person parameter estimation must be improved (APM, 2005) Responses to adaptive scales may be more complicated than we think Note that this research carries limited applied value, because traditional items are easily FAKED Single Stimulus Response Format: Single Stimulus Response Format Items consist of individual statements I get along well with others. (A+) I try to be the best at everything I do. (C+) I insult people. (A-) My peers call me 'absent minded.' (C-) Agree/Disagree or Likert type (SD,D,N,A,SA) response options are used In each case, socially desirable response is obvious. How to Deal With Faking?: How to Deal With Faking? Social Desirability (SD) scales often used to 'detect' and 'correct' for faking Adjustments made to content scale scores Little effect on validity Correcting for faking using SD scores is problematic, because… SD scales may function differently across testing situations (JAP, 2001) Need to develop fake-resistant items Search for Fake-Resistant Formats: Search for Fake-Resistant Formats Empirically keyed, nontransparent items But problems with construct and face validity data Biodata or situational judgments Do not measure personality directly Can be easily faked as soon as respondents told personality is being assessed Forced-choice (FC) items Halo and other biases are reduced (Borman et al., 2001) Intuitively, should reduce faking (Jackson et al., 2000) Unidimensional Pairwise Preference Format: Unidimensional Pairwise Preference Format Create items by pairing stimuli that are on the same dimension, but representing different locations on the trait continuum Sociability item: I talk a lot. (+3) My social skills are about average . (0) Respondent chooses statement that is 'More Like Me' Navy Computer Adaptive Personality Scales (NCAPS) uses this format Multidimensional Pairwise Preference Format: Multidimensional Pairwise Preference Format Create items by pairing stimuli that are similar in desirability, but representing different dimensions Positive item: I get along well with others. (A+) I set very high standards for myself. (C+) Negative item: I insult people. (A-) I work just enough to pass my classes. (C-) Variation of this approach is the tetrad format (Army AIM or SHL’s OPQ-32-i) Scoring Forced Choice Measures: Scoring Forced Choice Measures Traditional scoring of FC items is problematic Unidimensional FC scale scores have bi-modal distributions Multidimensional FC scores are ipsative Inter-individual comparisons not possible Scale scores correlate negatively (even facets of Big 5) Scoring lacks a formal psychometric model Difficult to evaluate scoring accuracy Does not provide insight about item construction Not usable for adaptive testing Are Forced Choice Scores Equivalent to Traditional Scores?: Are Forced Choice Scores Equivalent to Traditional Scores? FC measures are gaining popularity But, direct comparisons of traditional FC and SS scores not possible 'Score inflations' can only be evaluated within measures Correlations between measures are low Before evaluating FC measures in operational settings: Scores must be normative Under honest conditions, FC and SS scores should be the same Response Format Study(in review): Response Format Study (in review) Used advances in IRT to obtain normative scores for Order, Self Control and Sociability 36-item Single Stimulus measure 36-pair Unidimensional Pairwise Preference measure 36-pair Multidimensional Pairwise Preference measure All scores were estimated using IRT All items administered under honest conditions (N=602 for self reports and N=110 for observers) IRT Model for Single Stimulus Items: IRT Model for Single Stimulus Items Generalized Graded Unfolding Model (GGUM; Roberts et al., 1998) GGUM fit personality items well (Chernyshenko, 2002) No reverse scoring needed Example: “Ideal Point IRT” Order Scale: Example: 'Ideal Point IRT' Order Scale IRT Model for Scoring Unidimensional Pairwise Preferences (Stark & Drasgow,2002): IRT Model for Scoring Unidimensional Pairwise Preferences (Stark andamp; Drasgow,2002) Zinnes and Griggs (1974) Probabilistic Unfolding Model (ZG model) Idea: Respondent has ideal point representing his/her perception of typical behavior (trait level) Task: On each trial, respondent chooses the statement that better describes him/her Equation for ZG Item Response Functions: Equation for ZG Item Response Functions Slide22: IRF for Stimulus-Pair j = 17, k = 18 (m17 = 5.6, m18 = 3.8) IRT Model for Scoring Multidimensional Pairwise Preferences (Stark, 2002; Stark, Chernyshenko, & Drasgow, 2005): Respondent evaluates each stimulus (personality statement) separately and makes independent decisions about endorsement. Stimuli may be on different dimensions. Single stimulus response probabilities P{0} and P{1} computed using a unidimensional ideal point model for 'traditional' items (GGUM) IRT Model for Scoring Multidimensional Pairwise Preferences (Stark, 2002; Stark, Chernyshenko, andamp; Drasgow, 2005) 1 = Agree 0 = Disagree Refer to new pairwise preference model as MDPP Model Notation: Model Notation Normative Score Recovery: Normative Score Recovery Roberts et al. (2000) and Stark (1998, 2002) showed in simulations studies: Accurate normative scores could be recovered for GGUM, ZG and MDPP models 10 items or pairs per dimension are sufficient to obtain reasonable estimates But, no empirical study has compared scores from these 3 formats, even under 'honest' conditions Results for Conscientiousness Facets : Results for Conscientiousness Facets Correlations = reliability Positive correlation for MDPP facet scores. Results for Order and Sociability: Results for Order and Sociability Correlations = reliability Criterion Validities : Criterion Validities Criterion validities are comparable Conclusions : Conclusions Under honest conditions, MDPP, ZG, and SS versions of the questionnaire provided equivalent measurement and can be viewed as alternate forms Moving toward FC formats did not affect the validity of personality scores. Observing a positive correlation between Order and Self Control MDPP scales provided empirical evidence for normative scoring Current Research: Current Research Results of this study speak in favor of using ZG and MDPP IRT models for scoring FC scales Having IRT models makes transition to adaptive testing easy Adaptive format may offer additional benefit of fake resistance (see NCAPS presentations for recent IMTA talks) Current studies: How to best pair stimuli? How many unidimensional parings needed? Will increasing # of dimensions lead to more fake resistant scores? Can we better detect faking using forced choice than traditional format? You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
conf faking stark Shariyar Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 120 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: August 02, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items: Applying Ideal Point IRT Models to Score Single Stimulus and Pairwise Preference Personality Items Stephen Stark (USF) Oleksandr S. Chernyshenko (UC, NZ) Fritz Drasgow (UIUC) Overview: Overview 'Problems' with current personality assessment procedures The case for ideal point response process assumptions in personality Ideal point IRT models for single statement and pairwse preference items Score comparability study Personality Scale Construction Today: Personality Scale Construction Today Rooted in Classical Test Theory (CTT) and Common Factor Theory (CFT) Uses single stimulus format, fixed length scales and total scores in all analyses and interpretations Existing inventories Are static Contain a large number of relatively short scales Problem # 1: Problem # 1 Current scales worked well for research purposes, where the interest is to 'understand the relationship' between constructs But, these measures are not well-suited for adaptive formats or feedback purposes Item parameters are scale dependent Item difficulties do not directly correspond to item content, because of reverse scoring Scales are too short to have good precision More flexible test construction technology is needed Problem # 2: Problem # 2 CTT and CFT make dominance response process assumption This has been 'adopted' from cognitive ability testing To satisfy constraints of the dominance assumption Reverse scoring of negative items is introduced Neutral or extreme items are deleted from items pools because they have low item-total correlations (loadings) This results in depleted item pools and scales with properties more suitable for scholarship exams Slide6: Person endorses item if her standing on the latent trait, theta, is more extreme than that of the item. Only appropriate for moderately positive/negative items (e.g., 'I like/dislike parties') Person Dominance Response Process and Personality Items (MBR, 2001; JAP, 2006) Slide7: Person endorses item if her standing on the latent trait, theta, is near that of the item. 'My social skills are about average.' Disagree either because: Too introverted (uncomfortable talking to people) Too extraverted (great skills) Ideal Point Process: A More Flexible Alternative? Ideal Point Process and Personality (JAP, 2006; Psych Assessment, in press): Ideal Point Process and Personality (JAP, 2006; Psych Assessment, in press) Ideal point IRT models provided better fit to a wider variety of personality items than dominance IRT models Many nonmonotonic, but highly discriminating items have been found 30% more items were retained in item pools More items are available for scale construction Conclusions and Further Basic Research: Conclusions and Further Basic Research Ideal point process offers numerous advantages for improving current measures More research is needed Only few ideal point models are available; more flexibility is needed Item and person parameter estimation must be improved (APM, 2005) Responses to adaptive scales may be more complicated than we think Note that this research carries limited applied value, because traditional items are easily FAKED Single Stimulus Response Format: Single Stimulus Response Format Items consist of individual statements I get along well with others. (A+) I try to be the best at everything I do. (C+) I insult people. (A-) My peers call me 'absent minded.' (C-) Agree/Disagree or Likert type (SD,D,N,A,SA) response options are used In each case, socially desirable response is obvious. How to Deal With Faking?: How to Deal With Faking? Social Desirability (SD) scales often used to 'detect' and 'correct' for faking Adjustments made to content scale scores Little effect on validity Correcting for faking using SD scores is problematic, because… SD scales may function differently across testing situations (JAP, 2001) Need to develop fake-resistant items Search for Fake-Resistant Formats: Search for Fake-Resistant Formats Empirically keyed, nontransparent items But problems with construct and face validity data Biodata or situational judgments Do not measure personality directly Can be easily faked as soon as respondents told personality is being assessed Forced-choice (FC) items Halo and other biases are reduced (Borman et al., 2001) Intuitively, should reduce faking (Jackson et al., 2000) Unidimensional Pairwise Preference Format: Unidimensional Pairwise Preference Format Create items by pairing stimuli that are on the same dimension, but representing different locations on the trait continuum Sociability item: I talk a lot. (+3) My social skills are about average . (0) Respondent chooses statement that is 'More Like Me' Navy Computer Adaptive Personality Scales (NCAPS) uses this format Multidimensional Pairwise Preference Format: Multidimensional Pairwise Preference Format Create items by pairing stimuli that are similar in desirability, but representing different dimensions Positive item: I get along well with others. (A+) I set very high standards for myself. (C+) Negative item: I insult people. (A-) I work just enough to pass my classes. (C-) Variation of this approach is the tetrad format (Army AIM or SHL’s OPQ-32-i) Scoring Forced Choice Measures: Scoring Forced Choice Measures Traditional scoring of FC items is problematic Unidimensional FC scale scores have bi-modal distributions Multidimensional FC scores are ipsative Inter-individual comparisons not possible Scale scores correlate negatively (even facets of Big 5) Scoring lacks a formal psychometric model Difficult to evaluate scoring accuracy Does not provide insight about item construction Not usable for adaptive testing Are Forced Choice Scores Equivalent to Traditional Scores?: Are Forced Choice Scores Equivalent to Traditional Scores? FC measures are gaining popularity But, direct comparisons of traditional FC and SS scores not possible 'Score inflations' can only be evaluated within measures Correlations between measures are low Before evaluating FC measures in operational settings: Scores must be normative Under honest conditions, FC and SS scores should be the same Response Format Study(in review): Response Format Study (in review) Used advances in IRT to obtain normative scores for Order, Self Control and Sociability 36-item Single Stimulus measure 36-pair Unidimensional Pairwise Preference measure 36-pair Multidimensional Pairwise Preference measure All scores were estimated using IRT All items administered under honest conditions (N=602 for self reports and N=110 for observers) IRT Model for Single Stimulus Items: IRT Model for Single Stimulus Items Generalized Graded Unfolding Model (GGUM; Roberts et al., 1998) GGUM fit personality items well (Chernyshenko, 2002) No reverse scoring needed Example: “Ideal Point IRT” Order Scale: Example: 'Ideal Point IRT' Order Scale IRT Model for Scoring Unidimensional Pairwise Preferences (Stark & Drasgow,2002): IRT Model for Scoring Unidimensional Pairwise Preferences (Stark andamp; Drasgow,2002) Zinnes and Griggs (1974) Probabilistic Unfolding Model (ZG model) Idea: Respondent has ideal point representing his/her perception of typical behavior (trait level) Task: On each trial, respondent chooses the statement that better describes him/her Equation for ZG Item Response Functions: Equation for ZG Item Response Functions Slide22: IRF for Stimulus-Pair j = 17, k = 18 (m17 = 5.6, m18 = 3.8) IRT Model for Scoring Multidimensional Pairwise Preferences (Stark, 2002; Stark, Chernyshenko, & Drasgow, 2005): Respondent evaluates each stimulus (personality statement) separately and makes independent decisions about endorsement. Stimuli may be on different dimensions. Single stimulus response probabilities P{0} and P{1} computed using a unidimensional ideal point model for 'traditional' items (GGUM) IRT Model for Scoring Multidimensional Pairwise Preferences (Stark, 2002; Stark, Chernyshenko, andamp; Drasgow, 2005) 1 = Agree 0 = Disagree Refer to new pairwise preference model as MDPP Model Notation: Model Notation Normative Score Recovery: Normative Score Recovery Roberts et al. (2000) and Stark (1998, 2002) showed in simulations studies: Accurate normative scores could be recovered for GGUM, ZG and MDPP models 10 items or pairs per dimension are sufficient to obtain reasonable estimates But, no empirical study has compared scores from these 3 formats, even under 'honest' conditions Results for Conscientiousness Facets : Results for Conscientiousness Facets Correlations = reliability Positive correlation for MDPP facet scores. Results for Order and Sociability: Results for Order and Sociability Correlations = reliability Criterion Validities : Criterion Validities Criterion validities are comparable Conclusions : Conclusions Under honest conditions, MDPP, ZG, and SS versions of the questionnaire provided equivalent measurement and can be viewed as alternate forms Moving toward FC formats did not affect the validity of personality scores. Observing a positive correlation between Order and Self Control MDPP scales provided empirical evidence for normative scoring Current Research: Current Research Results of this study speak in favor of using ZG and MDPP IRT models for scoring FC scales Having IRT models makes transition to adaptive testing easy Adaptive format may offer additional benefit of fake resistance (see NCAPS presentations for recent IMTA talks) Current studies: How to best pair stimuli? How many unidimensional parings needed? Will increasing # of dimensions lead to more fake resistant scores? Can we better detect faking using forced choice than traditional format?