Reply to Chris Cole on Norming High-Range Tests

Dean Inada


In order to implement Chris's suggestion [Cole "How to Protect High-Range Tests," Noesis 155 (2001): 8-9] of tests consisting of a small subset of a larger set of questions, we'll want a better method of norming the tests than simply ranking people by the number of questions they get correctly, since one person may be asked harder questions than another. I suggest a method that tries to estimate for each question the probability of getting it right or wrong as a function of a person's percentile rank in the population, this rank is estimated by multiplying the generally increasing and decreasing functions for the problems gotten right and wrong. Bootstrapping the estimates, by starting with a slope 1 straight line as an initial estimate for the probability of getting a problem right and a slope -1 line as the probability of getting it wrong, is equivalent to just ranking each person based on the proportion of correct answers.

To test out this method, I tried taking the existing Mega Test scores vs. problems missed data from

http://www.eskimo.com/~miyaguch/megadata/item_ana.html

And when the iterations stopped changing, I had functions like:


wpeA.jpg (15213 bytes)


Multiplying the appropriate functions for a given person's answers gives a distribution of that person's expected percentile rank within the population. Multiplying a bunch of increasing and decreasing functions tends to look like a Gaussian, as you might expect. The more questions we ask, the more functions we have to multiply, and the width of the distribution gets narrower.


wpe7.jpg (8361 bytes)


(But even using all 48 questions the distribution still seems wide if you want to distinguish test takers at the 99.9999%ile.)

Problems with a function flat at the high end, such as #1, are not very useful for discriminating the high ranks and problems with a function flat at the low end, such as #36, are not very useful for discriminating in the low ranks. By choosing problems to ask whose greatest slope is in the range of our current estimated rank distribution we can more efficiently narrow down the rank estimate.

Multiplying the probabilities incorrectly assumes that each question is independent, but that may not be too bad since much of the correlation between questions is due to both being correlated to percentile rank within the sample populaton.

Clustering the questions by relative correlation (x tries to be closer to y than to z if x is more closely correlated with y than to z) we find one axis roughly proportional to problem difficulty, and the verbal analogies tending to cluster in one quadrant.

wpe9.jpg (17662 bytes)