Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Bottan, Nicolas Luis" <nbottan@iadb.org> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: Strange results with cluster option |

Date |
Mon, 27 Sep 2010 10:22:58 -0400 |

Hi everyone, I’m obtaining strange results using the cluster option when performing OLS (basically, the standard error increases when increasing cluster size – there is large heterogeneity in cluster size). I am attaching a simple Monte Carlo simulation in Stata to check whether the cluster option is working fine. I construct a simple example where an outcome Y is the sum of a school random variable and a student random variable. Both have mean 0 and standard deviation 1. I test the null hypothesis that the mean of Y is zero for each simulation. Because the null hypothesis is true, it should rejected only 5% of the times. Using the cluster option in Stata is rejected around 35% of the times. Alternatively, collapsing the data at the school level and then running Y on a constant (giving the same weight to all schools) the null is rejected 4% of the times. Any thoughts? Thanks! Here is the code: * THIS DO FILE GENERATES A MONTE CARLO SIMULATION TO CHECK WHETHER THE CLUSTER OPTION OF THE REG COMMAND IN STATA IS * WORKING WELL. ALSO IT CHECKS TWO ALTERNATIVE OPTIONS TO ESTIMATE STANDARD ERRORS WHEN OBSERVATIONS ARE CLUSTERED * TO THAT END, IT ASSUMES THAT: * Yij=Vj+Uij * where Y is some outcome variable defined at the student level, Vj is a school effect and Uij is a student effect * V and U are independent and they are distributed normal with mean 0 and standard deviation 1. * In the data, there are 100 schools. In 99 schools there is only one observation of a student. In one school there are * observations of 101 students * We test the null hypothesis that the mean of Y is zero. By construction this null is true. Then, we run 500 simulations * and we record in how many cases we reject the null under three different estimation strategies. In the first one we * use the cluster option in the regression command. In the second one we collapse the data at the school level (averaging Y) * and then run a regression of Y on a constant weighting observations by the number of students in the school. In the third * one we do the same procedure as in the second one but we give the same weight to all 100 schools * As we run 500 simulations, the different alternative estimations, if they are working well, they should be rejecting the * null approximately 25 times at the 5% level set seed 111111 local ctarech1=0 local ctarech2=0 local ctarech3=0 foreach it of numlist 1/500 { qui { clear set obs 200 gen j=_n replace j=100 if j>100 bysort j: gen i=_n gen v=rnormal() gen u=rnormal() replace v=-10 if i>1 egen aux=max(v),by(j) gen v2=aux replace v=v2 drop aux v2 gen y=v+u reg y,cluster(j) local a=abs(_b[_cons]/_se[_cons]) if `a'>1.96 { local ctarech1=`ctarech1'+1 } gen count=1 collapse y (sum) count,by(j) reg y [pw=count] local a=abs(_b[_cons]/_se[_cons]) if `a'>1.96 { local ctarech2=`ctarech2'+1 } reg y local a=abs(_b[_cons]/_se[_cons]) if `a'>1.96 { local ctarech3=`ctarech3'+1 } } } display "it=`it'" display "ctarech1=`ctarech1'" display "ctarech2=`ctarech2'" display "ctarech3=`ctarech3'" * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: tabout for multiple likert-scale questions** - Next by Date:
**Re: st: equality across quantile regressions WITHOUT sqreg** - Previous by thread:
**st: equality across quantile regressions WITHOUT sqreg** - Next by thread:
**st: xtivreg, re with weights** - Index(es):