Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Multilevel Models with Skewed Outcome using -runmlwin- |

Date |
Wed, 5 Sep 2012 21:13:58 +0100 |

Concentrating on the initial part, which I know most about: Negative skew will not be corrected by ln or sqrt; in fact it will be worsened. (The standard case in which ln is an appropriate transformation is the lognormal distribution, which is always positively skewed.) A transformation sometimes recommended for negatively skewed variables is the square, but that doesn't especially suitable here. It would stretch the tail so that ..., 13, 14, 15 become 169, 196, 225, which might symmetrize the distribution to some extent, but would probably be inappropriate for this kind of data. Most simply of all, the skewness is slight. However, in some ways a more important misconception is that the marginal distribution need be normal. In most models, what is important is (at most) that the _conditional_ distribution be of a certain kind (e.g. normality), meaning conditional on the predictors, even then this is a relatively unimportant assumption. Whoever is advising you seems to be missing the distinction between marginal and conditional distributions. Nick On Wed, Sep 5, 2012 at 8:23 PM, Emmott, Emily <emily.emmott.10@ucl.ac.uk> wrote: > I am currently using a large dataset with information on children's school test scores from multiple occasions (i.e., test nested in children). The test scores were collected as integers which range from 0 to 15. > > The problem I have is that the scores are negatively skewed: > > . sum TestScore, de > > Test Score > ------------------------------------------------------------- > Percentiles Smallest > 1% 2 0 > 5% 5 0 > 10% 6 0 Obs 7619 > 25% 8 0 Sum of Wgt. 7619 > > 50% 11 Mean 10.28376 > Largest Std. Dev. 2.985888 > 75% 13 15 > 90% 14 15 Variance 8.915529 > 95% 14 15 Skewness -.5706405 > 99% 15 15 Kurtosis 3.080436 > > > I have tried transforming the scores (ln, sqrt etc), but none seem to transform it into normality. > > Now, I have been advised that in multilevel models, skewness is less problematic, and as long as the outcome does not display excess kurtosis I should be ok to carry out a multilevel normal regression model. However, I have not been able to find any papers to support this, so I was not sure if I could fully trust the advice. > > I am currently using the -runmlwin- command, and estimate the models using MCMC estimation. > > I have tried two methods, the first to simply ignore the skew and run a multilevel normal regression. The second to categorise the Test Scores into 4 categories and run a multilevel ordered regression model. (Both using runmlwin & MCMC). > > In both cases the results are very similar, where the direction of the effects & whether the predictor is significant at the p<0.5 level are practically the same across both methods- which made me think maybe it was ok to keep the analysis as a multilevel normal regression, as the standard errors do not seem to be inflated in the normal regression (I'd read somewhere that skewed outcomes increase standard error thus Type1 error?). > > So, my question is, is it ok in my situation to carry out a multilevel normal regression on the test scores despite the skew? > > Furthermore, I had an additional question- Does it make a difference by using the MCMC estimation? I suspected that MCMC may produce relatively accurate estimates despite the skewed outcome. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Multilevel Models with Skewed Outcome using -runmlwin-***From:*"Emmott, Emily" <emily.emmott.10@ucl.ac.uk>

- Prev by Date:
**Re: st: loops for regions** - Next by Date:
**st: Outreg margins error** - Previous by thread:
**st: Multilevel Models with Skewed Outcome using -runmlwin-** - Next by thread:
**Re: st: Multilevel Models with Skewed Outcome using -runmlwin-** - Index(es):