Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multilevel Models with Skewed Outcome using -runmlwin-


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Multilevel Models with Skewed Outcome using -runmlwin-
Date   Wed, 5 Sep 2012 21:13:58 +0100

Concentrating on the initial part, which I know most about:

Negative skew will not be corrected by ln or sqrt; in fact it will be
worsened. (The standard case in which ln is an appropriate
transformation is the lognormal distribution, which is always
positively skewed.)

A transformation sometimes recommended for negatively skewed variables
is the square, but that doesn't especially suitable here. It would
stretch the tail so that ..., 13, 14, 15 become 169, 196, 225, which
might symmetrize the distribution to some extent, but would probably
be inappropriate for this kind of data. Most simply of all, the
skewness is slight.

However, in some ways a more important misconception is that the
marginal distribution need be normal. In most models, what is
important is (at most) that the _conditional_ distribution be of a
certain kind (e.g. normality), meaning conditional on the predictors,
even then this is a relatively unimportant assumption.

Whoever is advising you seems to be missing the distinction between
marginal and conditional distributions.

Nick

On Wed, Sep 5, 2012 at 8:23 PM, Emmott, Emily <emily.emmott.10@ucl.ac.uk> wrote:

>  I am currently using a large dataset with information on children's school test scores from multiple occasions (i.e., test nested in children). The test scores were collected as integers which range from 0 to 15.
>
>  The problem I have is that the scores are negatively skewed:
>
> . sum TestScore, de
>
>                          Test Score
> -------------------------------------------------------------
>       Percentiles      Smallest
>  1%            2              0
>  5%            5              0
> 10%            6              0       Obs                7619
> 25%            8              0       Sum of Wgt.        7619
>
> 50%           11                      Mean           10.28376
>                         Largest       Std. Dev.      2.985888
> 75%           13             15
> 90%           14             15       Variance       8.915529
> 95%           14             15       Skewness      -.5706405
> 99%           15             15       Kurtosis       3.080436
>
>
>  I have tried transforming the scores (ln, sqrt etc), but none seem to transform it into normality.
>
>  Now, I have been advised that in multilevel models, skewness is less problematic, and as long as the outcome does not display excess kurtosis I should be ok to carry out a multilevel normal regression model. However, I have not been able to find any papers to support this, so I was not sure if I could fully trust the advice.
>
>  I am currently using the -runmlwin- command, and estimate the models using MCMC estimation.
>
>  I have tried two methods, the first to simply ignore the skew and run a multilevel normal regression. The second to categorise the Test Scores into 4 categories and run a multilevel ordered regression model. (Both using runmlwin & MCMC).
>
>  In both cases the results are very similar, where the direction of the effects & whether the predictor is significant at the p<0.5 level are practically the same across both methods- which made me think maybe it was ok to keep the analysis as a multilevel normal regression, as the standard errors do not seem to be inflated in the normal regression (I'd read somewhere that skewed outcomes increase standard error thus Type1 error?).
>
>  So, my question is, is it ok in my situation to carry out a multilevel normal regression on the test scores despite the skew?
>
>  Furthermore, I had an additional question- Does it make a difference by using the MCMC estimation? I suspected that MCMC may produce relatively accurate estimates despite the skewed outcome.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index