Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: van der Waerden transformation

 From Austin Nichols To statalist@hsphsun2.harvard.edu Subject Re: st: van der Waerden transformation Date Fri, 13 Apr 2012 12:18:11 -0400

```Maarten--
A complete answer requires complete exposition of IRT, but the quick
answer is yes, more or less.
If you think underlying "achievement" is normally distributed, and you
used a reasonably well-designed test, you should convert the scores
back into a normal distribution as done via more sophisticated methods
on virtually every standardized test; the measure of latent
"achievement" is typically called theta.
Given that tests do not uniformly cover the difficulty space, there
will be skew and other nonnormality in scores, but a perfect test
(where the definition of perfect depends on what the test is to be
used for) might show a uniform distribution in percent correct from
zero to 100, which one could then turn back into a normal distribution
easily enough.  The distances then might give a reasonable measure of
how much harder it is to go from 98 to 99 than from 49 to 50 on this
hypothetical perfect test.
I have argued in print elsewhere that "achievement" is not normally
distributed, but let's leave that aside for now...  as no more
objectionable than assumptions in many -xt- commands on normality of
e.g. random effects/coefs.

On Fri, Apr 13, 2012 at 3:33 AM, Maarten Buis <maartenlbuis@gmail.com> wrote:
> On Thu, Apr 12, 2012 at 7:01 PM, Austin Nichols <austinnichols@gmail.com> wrote:
>> Maarten--
>
> Why would you want to make up distances between ranks in test scores?
> I can see why many of these do not have a natural unit, so some form
> of standardization is called for, but that does not mean that they
> should be forced into a normal/Gaussian distribution. If you find
> considerable skewness in your raw scores, would the forced to be
> normal variable really be a better represenation of what you found?
>
> -- Maarten
>
>>
>> On Thu, Apr 12, 2012 at 12:42 PM, Maarten Buis <maartenlbuis@gmail.com> wrote:
>>> On Thu, Apr 12, 2012 at 6:11 PM, Scott Merryman wrote:
>>>> Isn't the van der Waerden transformation just inverse_normal(rank/(N +1)) ?
>>>
>>> That sounds like an awful idea. That way you are just "inventing"
>>> distances between ranks that have nothing to do with what you
>>> observed. If you (generally speaking, not Scott specifically) really
>>> want to get rid of the skewness that badly, than just use the
>>> percentile rank and be honest about the fact that you have thrown away
>>> the information on the distances between the ranks rather than making
>>> those distances up. In general, I would _not_ try to get rid of the
>>> skewness, but rather use it. If it is a dependent variable that might
>>> suggest a -glm- with maybe a log link function. If it is an
>>> independent variable it might suggest a non-linear effect possibly to
>>> be modeled with splines (see: -mkspline-).
>>>
>>> I would be interested to hear if someone knows of an application where
>>> this transformation would make some sense. I cannot imagine one, but
>>> that may just be due to my lack of imagination.
>>>
>>> -- Maarten
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```