VALIDITY AND RELIABILITY

EP490asa

Leonard Fretzin

 

A valid assessment should reflect actual knowledge or performance, not just test taking skill or memorized equations and facts.  A valid assessment should not require knowledge or skills that are irrelevant to what is actually being assessed.  For a test to be fair, its contents, and performance expectations should reflect knowledge, and in certain subjects, such as the humanities, values, and experiences that are common to all students.  Assessments must, of course, also be as free as possible of cultural, ethnic and gender bias.

The validity of an assessment is the extent to which the assessment measures what it intended or was designed to measure.  Validation is the process of accumulating evidence that supports the appropriateness of student responses for the specified assessment uses.  Validity refers to the degree to which evidence supports the fact that the test interpretations are correct and that the manner in which these interpretations are used is appropriate and meaningful.

The three types of evidence that support the validity of an assessment instrument are called content, construct, and criterion.  Content-related evidence refers to the extent to which a student's responses to a given test question reflects that student's knowledge of the content area that is being assessed.  If this type of assessment does not reflect a student's knowledge then it lends a lack of validity to the overall assessment.

Content related evidence is also concerned with the extent to which the assessment questions adequately samples the content.  If the content, which is needed to be assessed is not adequately covered in the instrument of assessment this can also lead to a deficiency in validity.

Construct related evidence is the evidence that supports the intention that the test is only measuring what it was designed to measure.  When I give a test on a unit of chemistry, such as density, I am careful that the test in not measuring reading skills or computational skills above and beyond those needed for understanding density. 

 

Reliability refers to the stability of a test's results over time.  If a test is reliable then we would expect a student to attain the same score regardless of when the student completed the assessment.  If the instrument of assessment is not reliable, then a student's score may vary based on factors that are not related to the purpose of the assessment.  A test or other assessment that is not reliable is also not valid most of the time.

The second factor in determining an assessment's reliability is called rater reliability and refers to the consistency of scores when other teachers do the grading.  But since the tests I use for Chemistry are mostly scantron multiple choice and problems with a single correct solutions, this aspect of reliability can be ignored.

Good assessment requires minimizing factors that could lead to misinterpretation of results. Three criteria for meeting this requirement are reliability, validity, and fairness.

Reliability is defined as "an indication of the consistency of scores across evaluators or overtime."  An assessment is considered reliable when the same results occur regardless of when the assessment occurs or who does the scoring.