An examiner administers and scores the same test numerous times without deviating from the procedure in order to reduce the possibility of measurement error. This exemplifies standardization. The scores of a representative population sample on a test that an examiner compares an individual’s scores to are referred to as norms; while they allow for comparisons on a person’s performance on different tests, they do not provide the ultimate standard of performance. A psychological test that is regarded objective is administered, scored, and interpreted independent of the subjective judgment of the examiner.
The SAT and GRE are examples of Maximum performance tests, as they provide information about a person’s best possible performance, while the MMPI-2 and PAI are typical performance tests, providing information about a person’s usual experience. Power tests asses the difficulty level an examinee can attain (e.g., Information from WAIS), speed tests asses the person’s response rate (e.g., Digit Symbol from WAIS), and mastery tests help determine whether an individual can attain a certain level of acceptable performance (e.g., test of reading skills).
A ceiling effect occurs when an instrument cannot take on a value higher than some limit due to the measure not including enough difficult items, resulting in all high-achieving examinees getting similar scores (test is too easy); conversely, a floor effect occurs when an instrument cannot take on a lower value and thus all low-achieving examinees get similar scores (test is too hard).
In contrast to normative measures, Ipsative measures use individuals themselves as their own frame of reference, comparing 2 or more desirable options and choosing the one that is most preferred. Reliability is the consistency of a test, or the degree to which a test provides the same results under the same conditions; validity refers to the degree that a test measures what it claims to be measuring. A perfectly reliable test would yield every examinees’ true score every time it was administered, as this would indicate the examinees’ actual ability on whatever the test is measuring; however, a test is never perfectly reliable due to measurement error, which is random and can be caused by environmental noise, examinee’s mood on testing day, and any other number of factors.
The most commonly used methods of estimating reliability of a test use a correlation coefficient, referred to as the Reliability coefficient, ranging in value from 0.0 to +1.0, where coefficients closer to 0.0 indicate less reliability and values closer to +1.0 indicate increasing reliability; the coefficient is not squared to determine the proportion of variability, unlike other correlation coefficients, rather it is interpreted directly. A researcher administers the same instrument to the same group of college students on 2 separate occasions; following the second administration, the researcher correlates on the first and second administrations. The researcher is attempting to obtain Test-retest reliability (or “coefficient of stability”). It is not recommended to use the test-retest coefficient when attempting to obtain reliability for a test that measures attributes that are unstable (e.g., mood). Low coefficients, in such cases, would likely be more a reflection of the attribute’s unreliability rather than the test’s unreliability.[ad_2]