RELIABILITY: RELIABILITY By: Abayon, Yule Rachel RELIABILTY: R ELIABILTY Reliability has to do with the quality of measurement. In its everyday sense, reliability is the "consistency" or "repeatability" of your measures. Before we can define reliability precisely we have to lay the groundwork . RELIABILTY: R ELIABILTY The degree of consistency between two measures of the same thing. ( Mehrens and Lehman, 1987 ). The measure of how stable, dependable, trustworthy, and consistent a test is in measuring the same thing each time ( Worthen et al., 1993 ) RELIABILTY: R ELIABILTY Reliability is one of the most important elements of test quality. It has to do with the consistency, or reproducibility, of an examinee's performance on the test. A test with poor reliability, on the other hand, might result in very different scores for the examinee across the two test administrations. If a test yields inconsistent scores, it may be unethical to take any substantive actions on the basis of the test . TYPES OF RELIABILITY: TYPES OF RELIABILITY Split-Half Reliability Inter - rater Reliability Internal Consistency Parallel Forms Reliability Test-Retest Reliability SPLIT-HALF RELIABILITY : SPLIT-HALF RELIABILITY In split-half reliability we randomly divide all items that purport to measure the same construct into two sets. We administer the entire instrument to a sample of people and calculate the total score for each randomly divided half. INTER - RATER RELIABILITY: INTER - RATER RELIABILITY All of the methods for estimating reliability discussed thus far are intended to be used for objective tests. When a test includes performance tasks, or other items that need to be scored by human raters, then the reliability of those raters must be estimated. INTER - RATER RELIABILITY: INTER - RATER RELIABILITY This reliability method asks the question, "If multiple raters scored a single examinee's performance, would the examinee receive the same score. Interrater reliability provides a measure of the dependability or consistency of scores that might be expected across raters. INTERNAL CONSISTENCY: INTERNAL CONSISTENCY The internal consistency measure of reliability is frequently used for norm referenced tests (NRTs). t his method has the advantage of being able to be conducted using a single form given at a single administration. INTERNAL CONSISTENCY: INTERNAL CONSISTENCY The internal consistency method estimates how well the set of items on a test correlate with one another; that is, how similar the items on a test form are to one another. PARALLEL FORMS RELIABILITY: PARALLEL FORMS RELIABILITY These parallel forms are all constructed to match the test blueprint, and the parallel test forms are constructed to be similar in average item difficulty. Parallel forms reliability is estimated by administering both forms of the exam to the same group of examinees. While the time between the two test administrations should be short, it does need to be long enough so that examinees' scores are not affected by fatigue. The examinees' scores on the two test forms are correlated in order to determine how similarly the two test forms function. PARALLEL FORMS RELIABILITY: PARALLEL FORMS RELIABILITY This reliability estimate is a measure of how consistent examinees’ scores can be expected to be across test forms. This reliability estimate is a measure of how consistent examinees’ scores can be expected to be across test forms . TEST-RETEST RELIABILITY: TEST-RETEST RELIABILITY To estimate test-retest reliability, you must administer a test form to a single group of examinees on two separate occasions. Typically , the two separate administrations are only a few days or a few weeks apart; the time should be short enough so that the examinees' skills in the area being assessed have not changed through additional learning. TEST-RETEST RELIABILITY: TEST-RETEST RELIABILITY The relationship between the examinees' scores from the two different administrations is estimated, through statistical correlation, to determine how similar the scores are. This type of reliability demonstrates the extent to which a test is able to produce stable, consistent scores across time.