In statistical terms, validity is defined as the proportion of the true variance that is relevant for the purposes of the examination. With the term ‘relevant’ we refer to what is attributable to the variable, the characteristics that the test measures. But do we know that there are several types of validity?
In this sense, generally the validity of a test is defined either by means of:
- The relationship between their scores with some measure of external criteria, or,
- The extent to which the test measures a hypothetical specific underlying trait or “construct”.
Validity in psychometric terms
In psychometric terms, validity is a concept that has gone through a long evolutionary process. At first, Muñiz (1996) adopted the validity with a specific position. She maintained that “a test is valid for what it correlates with”.
Now, validity is understood as a global evaluative judgment. In this judgment, the empirical evidence and the theoretical assumptions support the sufficiency and appropriateness of the interpretations not only of the items, but also of the way people respond as well as the context of the evaluation.
So, what is validated is not the test. What is validated in concrete are the inferences made from it. This has two consequences:
- The person responsible for the validity of a test is not only its constructor, but also the user.
- The validity of a test is not established once and for all. It is the result of the accumulation of evidences and theoretical assumptions that occur in an evolutionary and continuous process. This includes all the experimental, statistical and philosophical questions by means of which scientific hypotheses and theories are evaluated.
In this context, the concept of validity refers to the adequacy, meaning and utility of the specific inferences made with the test scores. The validation of a test is the process of accumulating evidence to support such inferences. Thus, validity is a unitary process. Although evidence can be accumulated in many ways, validity always refers to the degree to which that evidence supports the inferences made from the scores.
Types of evidence
In 1954, a committee chaired by L. J. Cronbach established on behalf of the American Psychological Association (APA) that the validity was of four types. These are:
- Content validity
- Predictive validity
- Concurrent validity
- Construct validity
It is currently agreed, from the scientific point of view, that the only admissible validity is construct validity (Messick, 1995).
Validity and its aspects
Within the study of validity, the evidence is related to five aspects:
- The content (the relevance and representativeness of the test).
- The noun (the theoretical reasons for the observed consistency of the answers).
- Structural (internal configuration of the test and dimensionality).
- Generalization (the degree to which the inferences made from the test can be generalized to other populations, situations or tasks).
- External (test relations with other tests and constructs).
- Consequence (ethical and social consequences of the test) .
Thus, within this validity we can understand other types of validity or strategies. As we mentioned previously, these are the content validity, the predictive validity, the concurrent validity and the construct validity.
Validity types: content validity
In this type of validity, the following question is answered. Are the items that constitute the test really a representative sample of the domain of content or behavioral domain that interests us?
For us to understand each other, a domain or behavioral field is a hypothetical grouping of all possible items that cover a particular psychological area. For example, a vocabulary test should be an adequate sample of the domain of possible items in this area.
In this sense, content validity is a “measure” of the adequacy of sampling. It is said “measure” in quotes, since this type of validity consists of a series of estimates or opinions. These estimates do not provide a quantitative index of validity.
This type of validity is associated above all with performance tests (math test, history…). For its determination, the test questions are systematically compared with the behavioral domain of the postulated content.
For example, we have a list of 500 words, we hope students in a course are able to write correctly. Then, your performance on these words will be important exclusively to test the student’s ability to correctly write the 500 words. However, it will only have content validity insofar as it provides an adequate sample of the 500 words it represents.
If we select only easy or difficult words, or words that represent only certain types of spelling mistakes, we would be prone to obtain a very low content validity.
Conclusion: what is the usefulness of content validity?
Consequently, the key aspect in the content validity is the sampling of the items. In other words, the content validity is able to determine if the sample of its items is representative of the universe or behavioral domain of the item it supposedly represents.
Thus, content validity is that type of validity that is linked to the test itself and to what it intends to measure. For example, it will allow us to know if the sample of the test items is representative of the domain in mathematics that we want to evaluate. It is, therefore, an important concept both in statistics and in the use of psychological or performance tests.