Psychometrics: Validity : Psychometrics: Validity Concerned with what the test measures and how well it does so Slide 2: How well is the test measuring the domain of knowledge that it is supposed to measure?
It measures systematic errors--something specific about the test is missing.
This is unlike reliability which measures random errors. Example of Systematic Error : Example of Systematic Error The factor of language within a test.
The test is supposed to measure coordination, but there is a big language component because the person has to read and understand the instructions. Therefore, the systematic error is that language is being tested along with coordination. Slide 4: Sometimes this happens with educational tests where the professor uses unusually difficult language, thereby reducing the validity of the test.
To improve the validity of a test, you try to reduce the systematic errors. Slide 5: Note that you can have good reliability and poor validity.
For example, a test can measure something consistently, but not be accurate.
However, it does no good to have a test that is reliable, but with poor validity. Slide 6: Conversely, you don’t want a test with good validity and poor reliability.
This type of test would measure a given trait, but could not be counted upon to measure it consistently. Types of Validity : Types of Validity The types of validity studies done depend on the purpose of the test
The four types of validity are:
Face Content Validity : Content Validity This type is mostly concerned with achievement tests
It is used specifically when a test is attempting to measure a defined doman
Content validity indicates whether or not the test items adequately represent that domain. Slide 9: Content validation begins during the development of a test.
Usually a Table of Specifications is built in order to ensure that the entire domain is represented by the items in a test. Table of SpecificationsMiller Assessment for Preschoolers : Table of SpecificationsMiller Assessment for Preschoolers Slide 11: The table of specifications compares subtests or specific items to the behavioral domain being tested.
In the previous slide we see 3 behavioral domains and four subtests.
The x’s in the boxes tell you that those domains are being tested.
In the end, all domains should be represented in the test. Slide 12: Another method of content validation is by using experts in the field
The test is sent out to experts who review the test and the domains to be evaluated.
This is used in conjunction with the table of specifications. When is it Appropriate to do Content Validation? : When is it Appropriate to do Content Validation? Appropriate for:
Tests related to occupation: employment, classification, job tasks.
(no specific domain of knowledge to be tested) Criterion-Related Validity : Criterion-Related Validity Indicates the effectiveness of a test in predicting an individual’s performance on specific activities.
Performance is checked against a criterion
Criterion: A direct and independent mesure of what the test is designed to predict.
Example: for a test of vocational aptitude, the criterion might be job performance Slide 15: There are two types of criterion-related validity: concurrent and predictive. These two types are differentiated by the time period between the test and the criterion.
Concurrent: Short time period between test and criterion.
Predictive: Long time period between test and criterion. Concurrent Validity : Concurrent Validity Example: A test is developed to identify individuals with tactile hypersensitivity.
Using concurrent validity, the goal would be to see how well the test can identify who has tactile hypersensitivity.
Two groups of subjects are needed: one group known to have tactile hypersensitivity, one group known to have normal tactile function. Slide 17: The test is then given to both groups of people.
If the test has concurrent validity, it will accurately identify those who have tactile hypersensitivity and those who are normal.
Look for a high classification rate: perhaps 90% or so. Slide 18: Another method of concurrent validity:
One group of subjects takes the newly developed test, and also takes a test that is established in the field.
The results on these two tests are compared
Hopefully the new test will be as accurate as the old test. Example Using the MAP and DDST : Example Using the MAP and DDST What Does This All Mean? : What Does This All Mean? You can see that about 70% of the kids were classified as normal by both tests.
However, 22% that were classified as normal by the DDST were classified as questionable by the MAP.
3% of the kids who were questionable on the DDST were normal on the MAP Is the MAP a Better Test Than the DDST? : Is the MAP a Better Test Than the DDST? It appears that the MAP identified 24% more kids who potentially had problems (22% in the yellow- questionable category and 2% in the red-abnormal category)
However, predictive studies are needed when these kids reach school age--to see if the MAP was accurate. Predictive Validity : Predictive Validity Similar to concurrent validity, but with a longer time period between testing and measurement of criterion.
Need a group of people who can be studied long term. This is called a longitudinal study.
Everyone in the group is given the test and scores are tabulated. Slide 23: Then you wait a period of time--usually months or years, but it could be shorter depending upon what is being measured.
After a period of time, the criterion is measured. In the example used previously, after time do the children in the group develop tactile hypersensitivity? Slide 24: The test that was given previously is then compared to their hypersensitivity status.
Did the test accurately predict which kids would develop hypersensitivity?
If so, then the test has predictive validity Slide 25: Predictive validity takes a long time to establish and a test may have predictive validity studies going on for years.
In the meantime, concurrent validity studies help establish that the test does indeed measure a specific criterion. Construct Validity : Construct Validity Construct: an unobservable trait that is known to exist.
Examples: IQ, motivation, self-esteem, motor planning, anxiety
How can we be sure a test measures these constructs if we can’t directly measure them? Slide 27: Ways to assess construct validity include:
Correlations with other tests
Convergent and discriminant validation Developmental Changes : Developmental Changes Employs the idea of age differentiation
Useful for any test that is developmental in nature.
Since abilities increase with age (during childhood), it is logical that test scores will also increase with age
This is one measure of construct validity, but is not conclusive. Slide 29: In other words, determining construct validity by developmental changes alone is not sufficient.
There needs to also be other measures of construct validity. Correlations with Other Tests : Correlations with Other Tests Correlations between the new test and established tests helps to establish construct validity.
The idea is that if the correlation is relatively high, the new test measures the same traits as the established test.
Want moderate as opposed to very high correlations (If correlations are very high you have to wonder if the new test is really necessary?) Factor Analysis : Factor Analysis FA is a multivariate statistical technique which is used to group multiple variables into a few factors.
In doing FA you hope to find clusters of variables that can be identified as new factors. Example: Factor Analysis : Example: Factor Analysis In the standardization of the Miller Assessment for Preschoolers (MAP) over 1000 children were tested using that assessment.
The FA study looked at the interrelationships between the various subtests and came up with 6 primary factors. Factor Analysis of the MAP : Factor Analysis of the MAP Convergent and Discriminant Validation : Convergent and Discriminant Validation The idea is that a test should correlate highly with other similar tests, and
The test should correlate low with tests that are very dissimilar. Example : Example A newly developed test of motor coordination should correlate highly with other tests of motor coordination.
It should also have low correlations with tests that measure attitudes.