PROMS2012

Views:
 
     
 

Presentation Description

PROMS 2012, Jiaxing, Chaina

Comments

Presentation Transcript

Can difficulty of items be guessed intelligently without degrading CAT results?: 

Can difficulty of items be guessed intelligently without degrading CAT results? Tetsuo Kimura ( Waseda University / Niigata Seiryo University) Keizo Nagaoka ( Waseda University) PROMS 2012 Jiaxing , China August 6-9, 2012

Outline: 

Outline Background s & Previous Studies Prerequisites for CAT implement Construction of item banks for CAT for EFL learners Moodle UCAT module based on Linacre’s UCAT Current Study Research Question Research Design Result & Conclusions 1 UCAT

Prerequisites for CAT implement: 

Prerequisites for CAT implement 2 CAT is greedy! CAT likes big pool!

Construction of item bank: 

Construction of item bank Pretesting 3 Item analysis & elimination of misfit More pretests with new items and anchored items Item bank Calibrated items Anchored items

PowerPoint Presentation: 

4 Items Test A New Items Anchors Anchors Anchors Anchors Test B New Items Anchors Anchors Persons Anchors Anchors Test C New Items Anchors Anchors Anchors Anchors Test D New Items Equation Design

Types of items used in the study: 

Types of items used in the study All the items were adopted from the Eiken Test Grade pre 1 to Grade 3 , under the permission of the Society for Testing English Proficiency (STEP). Listening comprehension ( Lng ) Reading comprehension ( Rdg ) Vocabulary and grammar ( Vgm ) Listening comprehension with dialogue ( Dlg ) Listening comprehension with monologue ( Mlg )

Construction of item bank (Vgm): 

Construction of item bank ( Vgm ) 6 N Item Anchor New Misfit The 1 st pretest 2008A 222 20 80 1 2008B 20 2008C 20 2008D 20 The 2 nd pretest 2009A 292 32 16 64 3 2009B 258 32 16 2009C 268 32 16 2009D 252 32 16 The 3 rd pretest 2010A 284 26 6 120 2 2010B 256 26 6 2010C 224 26 6 2010D 304 26 6 2010E 295 26 6 2010F 212 26 6 Total Test-taker 2867 Total Item 258

Construction of item bank (Dlg): 

Construction of item bank ( Dlg ) 7 N Item Anchor New Misfit The 1 st pretest 2008A 157 12 47 1 2008B 12 2008C 12 2008D 11 The 2 nd pretest 2009A 297 16 7 38 1 2009B 275 16 6 2009C 283 16 6 2009D 290 16 7 The 3 rd pretest 2010A 263 29 6 94 2010B 321 29 6 2010C 310 30 6 2010D 142 30 6 Total Test-taker 2867 Total Item 177

Construction of item bank (Mlg): 

Construction of item bank ( Mlg ) 8 N Item Anchor New Misfit The 1 st pretest 2008A 119 8 35 0 2008B 9 2008C 9 2008D 9 The 2 nd pretest 2009A 277 16 9 26 1 2009B 274 16 10 2009C 282 16 10 2009D 286 16 9 The 3 rd pretest 2010A 198 24 6 70 0 2010B 257 24 6 2010C 221 23 6 2010D 221 23 6 Total Test-taker 2338 Total Item 130

PowerPoint Presentation: 

Construction of item bank: Common Person Linking Dlg & Mlg  Lng r = .86 Mlg = Dlg × 1.18 + 0.06 r = 0.89 Dlg = Mlg × 0.85 + 0.05

Construction of item bank: 

Construction of item bank 10 Vgm N AVG SD G1.5 73 1.57 0.84 G2 69 0.52 0.81 G2.5 67 -0.47 0.91 G3 49 -1.41 0.80 Total 258 0.19 1.37 Lng N AVG SD G1.5 44 1.26 1.42 G2 109 0.77 1.11 G2.5 75 0.35 1.05 G3 80 -0.90 1.33 Total 308 0.30 1.43

UCAT (Useful Computer Adaptive Test): 

UCAT (Useful Computer Adaptive Test) Pretesting new items for the item bank Adding new test items to the item bank Recalibrating the bank A ddressed major problems in CAT 11

UCAT: CAT with Item Bank Recalibration (Linacre, 1987): 

UCAT: CAT with Item Bank Recalibration (Linacre, 1987) Difficulty level of the new items can be guessed intelligently without degrading the resulting ability estimates. The degradation of measures by poor item calibration is further diminished by the self-correcting nature of CAT. Poor calibration of a few items is not deleterious to Rasch measurement. Wright and Douglas (1975) / and Yao (1991) 12

UCAT: CAT with Item Bank Recalibration (Linacre, 1987): 

UCAT: CAT with Item Bank Recalibration (Linacre, 1987) Existing items can be recalibrated with minimal impact on previous test-taker measures. This is especially important when the item difficulty calibrations are derived from non-CAT sources, or when there is concern that part of the item bank has become public knowledge. 13

UCAT: CAT with Item Bank Recalibration (Linacre, 1987): 

UCAT: CAT with Item Bank Recalibration (Linacre, 1987) The CAT test developer or the CAT administrator can choose to have the difficulties of the items in the bank recalibrated at any point based on the responses of those to whom the items have been administered so far. As part of the recalibration procedure, all test-takers are remeasured based on their original responses and the revised item difficulties. 14

UCAT: CAT with Item Bank Recalibration (Linacre, 1987): 

UCAT: CAT with Item Bank Recalibration (Linacre, 1987) The final revised item calibrations are computed in such a way as to maintain unchanged the mean of the ability estimates of those who have already taken the test. This minimizes the effect of the recalibration on any previously reported test results. 15

Moodle UCAT Module beta ver.: 

Development Status CAT setting window Ending conditions Logit to unit conversion Logit bias CAT administration window Set item difficulty individually or category by category Set student ’s ability individually or as a whole Administer CAT and provide result individually Retrieve C AT processes and results Recalibration of i tem difficulty & estimate ability Unit = Logit×10 + 100 Moodle UCAT Module beta ver. 16 Under Development for Ver.1 to be released in late August

CAT Algorithm: Initial Ability Estimation: 

CAT Algorithm: Initial Ability Estimation 17 UCAT Moodle UCAT Lower Limit ( LL ) = AVG(D) - (0.5+0.5* RND ) Upper Limit ( UL ) = LL + 1 B 0 = AVG(D) - 0.5* RND AVG(D) : average item difficulty RND : random value between 0 & 1 B 0 : initial ability Assign each student’s initial ability in the CAT administration window based on other test results or intelligently one by one, or as a whole.

CAT Algorithm: Ability (B) Estimation: 

CAT Algorithm: Ability ( B ) Estimation 18 UCAT / Moodle UCAT the number of successes probability of success of a student of ability B m on the i -th dministered item of difficulty Di

CAT Algorithm: Standard Error (SE) Estimation: 

CAT Algorithm: Standard Error ( SE ) Estimation 19 UCAT / Moodle UCAT

CAT Algorithm: Item Selection : 

CAT Algorithm: Item Selection 20 UCAT / Moodle UCAT Next item will be selected randomly between LL and UL score when he next (m- th ) answer will be wrong If no item found between LL & UL , use the closest. Ability estimate when the next answer will be wrong Ability estimate when the next answer will be correct

CAT Algorithm: Ending Condition: 

CAT Algorithm: Ending Condition 21 UCAT / Moodle UCAT Prescribed number of item Prescribed SE Both number of item and SE All item

CAT Algorithm: Item Selection (logit bias) : 

CAT Algorithm: Item Selection ( logit bias) 22 Moodle UCAT LL and UL can be adjusted by adding logit value to the Logit bias box in the CAT setting window Positve logit value decrease the chance of answer correct Negative logit value increase the chance of answer correct

Cureent Study: 

Cureent Study 23 CAT 1 : item difficulty was determined grade by grade intelligently CAT 2 : item difficulty was determined item by item based on the result of pretestings Eiken grade Difficulty in Unit Vgm Lng Pre 1st 115 113 2nd 105 108 Pre 2nd 95 104 3rd 85 91 CAT conditions Initial estimate ability: 0.0 logit (100 unit) Ending condition: number of item (16 items) Logit bias: 0

Construction of item bank: 

Construction of item bank 24 Vgm N AVG SD G1.5 73 1.57 0.84 G2 69 0.52 0.81 G2.5 67 -0.47 0.91 G3 49 -1.41 0.80 Total 258 0.19 1.37 Lng N AVG SD G1.5 44 1.26 1.42 G2 109 0.77 1.11 G2.5 75 0.35 1.05 G3 80 -0.90 1.33 Total 308 0.30 1.43

Current Study: 

Current Study 25 Research Question Results of CAT Part 1 and Part 2 are consistent? Can difficulty of items be guessed intelligently without degrading CAT results? Method Comparing the results of two CATs administered to the same test takers (59 Japanese freshmen of engineering department) on Moodle UCAT

Results of CAT1 & CAT2 (Vgm): 

Results of CAT1 & CAT2 ( Vgm ) 26 Comparison of Ability Estimate (in Unit) CAT1 CAT2 AVG 108.1 106.6 SD 7.6 8.8 MAX 125 119 MIN 87 90 CAT1 CAT2 AVG 5.32 5.39 SD 0.24 0.33 MAX 5.87 6.38 MIN 5.05 5.05 Estimate Ability (in Unit) Standard Error (in Unit)

Results of CAT1 & CAT2 (Lng): 

Results of CAT1 & CAT2 ( Lng ) 27 Comparison of Ability Estimate (in Unit) CAT1 CAT2 AVG 109.5 113.8 SD 6.0 11.8 MAX 117 131 MIN 101 99 CAT1 CAT2 AVG 5.23 5.38 SD 0.33 0.30 MAX 6.00 6.00 MIN 5.00 5.00 Estimate Ability (in Unit) Standard Error (in Unit) r =.55 N =13

Construction of item bank: 

Construction of item bank 28 Vgm N AVG SD G1.5 73 1.57 0.84 G2 69 0.52 0.81 G2.5 67 -0.47 0.91 G3 49 -1.41 0.80 Total 258 0.19 1.37 Lng N AVG SD G1.5 44 1.26 1.42 G2 109 0.77 1.11 G2.5 75 0.35 1.05 G3 80 -0.90 1.33 Total 308 0.30 1.43

Case Study: 

Case Study 29 Conclusions & Limitation Item difficulty guessed intelligently will NOT degrade the resulting ability estimates, as long as the guess is good enough . Sample size is very small.

PowerPoint Presentation: 

Happy CAT For Everyone 30

REFERENCES: 

REFERENCES Kimura, T. (2009). Construction of a Moodle-based placement test and possibility of a Moodle-based computer adaptive test. ARELE 20, 161-169. Kimura , T. & Nagaoka , K. (2010a). Towards the construction of item banks for moodle-based in-house computer adaptive English tests. Pacific Rim Objective Measurement Symposium 2010 Kuala Lumpur. Kimura, T. & Nagaoka , K. (2010b). Toward the construction of Moodle-based in-house computer adaptive test 1:Improvement of tem banks, 343-344. JSET 26. Kimura, T. & Nagaoka , K. (2011). Psychological aspects of CAT: How test-takers feel about CAT., IACAT Conference 2011, Pacific Grove, CA. Kimura, T. & Nagaoka , K. (2011). Toward the construction of Moodle-based in-house computer adaptive test 2: Consolidation of item banks, JSET 27. Linacre, J.M. (1987). UCAT: a BASIC computer-adaptive testing program. MESA Psychometric Laboratory. (ERIC ED 280 895). Linacre, J.M. (2006). Computer Adaptive Tests, Standard Errors and Stopping Rules. Rasch Measurement Transaction 20:2, p.1062. Wright, B.D. and Douglas, G. (1975). Best test design and self-tailored testing. MESA Memorandum No. 19. Department of Education, Univ. of Chicago Yao, T. (1991). CAT with a poorly calibrated item bank. Rasch Measurement Transactions 5:2, p.141. 31

Thank you for listening.: 

Thank you for listening. Tetsuo Kimura (tetsuo.kmr@gmail.com) Acknowledgements: A part of the present study was supported by a Grant-in-Aid for Scientific Research for 2010-2012 (No. 22520590) from the Japan Society for the Promotion of Science. 32

PowerPoint Presentation: 

Vocabulary & grammar (Vgm) STEP Grade3 2008 Summer

PowerPoint Presentation: 

Listening with dialogue ( Dlg ) STEP Grade3 2008 Summer

PowerPoint Presentation: 

Listening with monologue ( Mlg ) STEP Grade3 2008 Summer