[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Karl-Oskar Lindgren" <Karl-Oskar.Lindgren@statsvet.uu.se> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Simulate a skewed variable in stata, sample vs. population skewness |

Date |
Mon, 07 Dec 2009 11:06:26 +0100 |

Dear listusers, I have a question that I guess is partly statistical and partly philosphical. In a paper that uses Monte-Carlo simulations to study the small sample performance of an estimator I was asked by a referee to investigate how the estimator performs when the error terms are skewed. When trying to implement this suggestion I realized that sample skewness as reported by stata can differ considerably from the skewness of the underlying population (although both the sample mean and variance of the variable remain close to their population counterparts). My question is therefore if it is the sample skewness or the population skewness that should be kept constant when examining the small sample performance of a statistical estimator. In case my question is unclear the following simple example may help illustrate the gist of my problem. Let's assume that we want to study how the OLS-estimator perform in small samples when the error terms are skewed. In order to do this we decide to generate 10 error terms from a chi-square distribution with 1 degree-of-freedom. The population skewness should then be 2^(3/2), i.e., about 2.8. But if I generate 1000 samples from such a distribution in stata the average skewness across these 1000 samples turn out to be about 1.3 (see the example code below). I understand that the reason for the discrepancy is that measures of skewness tend to be biased in small samples when the variables are non-normal (indeed the sample skewness is approaching its theoretical level as we increases the number of observations in the example below). My question, however, concerns whether it is the sample skewness or the population skewness that I should keep constant in my replications when I vary the other parameters of the model. If it is the population skewness the implementation is straightforward since the skewness in the population is known. But if it is the sample skewness that should be kept constant I would appreciate any hints of appropriate methods to accomplish this. **Example code to illustrate the bias of r(skewness) program define skewchi, rclass version 9.2 drop _all set obs 10 gen double x=invnorm(uniform()) gen double x2=x^2 sum x2, detail return scalar mean=r(mean) return scalar var=r(Var) return scalar skew=r(skewness) end simulate mean=r(mean) var=r(var) skew=r(skew), /// reps(1000) seed(1) dots: skewchi sum Best wishes, Karl-Oskar Lindgren Department of Government Uppsala University * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Simulate a skewed variable in stata, sample vs. population skewness***From:*Austin Nichols <austinnichols@gmail.com>

**st: RE: Simulate a skewed variable in stata, sample vs. population skewness***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**st: AW: Simulate a skewed variable in stata, sample vs. population skewness***From:*"Martin Weiss" <martin.weiss1@gmx.de>

- Prev by Date:
**st: AW: huge test statistic with -logit-** - Next by Date:
**st: AW: Simulate a skewed variable in stata, sample vs. population skewness** - Previous by thread:
**st: huge test statistic with -logit-** - Next by thread:
**st: AW: Simulate a skewed variable in stata, sample vs. population skewness** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |