Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: SE with cluster option


From   "Mark Schaffer" <M.E.Schaffer@hw.ac.uk>
To   "Alexander Nervedi" <alexnerdy@hotmail.com>
Subject   Re: st: SE with cluster option
Date   Tue, 18 Oct 2005 20:08:56 +0100 (BST)

Al,

Other patterns in the data can generate this problem.  For example, you
might have a variable that is the cluster equivalent of a singleton dummy:
the variable has two values, =x for all obs in one cluster and =y for the
rest.

It's ad-hocky, but try running your regression clustering on household
(survey code) but dropping one variable at a time and seeing if and when
the problem goes away.  This will let you trace the problem.

Cheers,
Mark

NB: Does the above recommendation remind anyone else of the following
ancient computing joke?

Q: How do you know that's an IBM repairman on the side of the road with a
flat tire?

A: He changes each tire, one after the other, until he finds out which one
is flat.


> Hi Mark,
>
> Yes! I clicked that and it goes on to talk about situations in which
> F(.,.)
> goes missing. All the discussion is about when the number of parameters is
> equal to or more than the number of observations. For example,  "You might
> see  chi2(6) or F(6, 5).  If you were to count the number of coefficients
> that would be constrained to 0 in a model test in this case, you would
> find
> that number to be greater than 6. You could find out what that number is
> by
> reestimating the model     parameters without the robust and cluster()
> options".
>
> I dont think this is my problem - I have enough observations (about 40
> observations per cluster per season (so about 120 since i have three
> seasons)). Also I can estimate the model with robust, but not with
> cluster().
>
> So i am not sure what is going on.
>
> Thanks for your email Mark!
>
> -Anerdy
>
>
>
>>From: "Mark Schaffer" <M.E.Schaffer@hw.ac.uk>
>>Reply-To: statalist@hsphsun2.harvard.edu
>>To: statalist@hsphsun2.harvard.edu
>>CC: "mes " <m.e.schaffer@hw.ac.uk>
>>Subject: Re: st: SE with cluster option
>>Date: Tue, 18 Oct 2005 19:17:49 +0100 (BST)
>>
>>Al,
>>
>> > Hi Everyone,
>> >
>> > I was wondering what may explain the following F(.,.) valuse when i
>> use
>> > the cluster option. I have about 40 households per cluister, and four
>> > clusters (total of 168 unique households). I'd like to run the model
>> at
>> > the cluster level to estimate a Difference in Difference model.
>> >
>> > Initially I thought the issue was that since there are only 4
>> clusters,
>> > I'd not be able to estimate it since its using 4 cluster means to
>>estimate
>> > the standard errors.
>>
>>You are right - in effect, you have 4 observations ("super-observations"
>>is perhaps more accurate) to calculate your var-cov matrix, which means
>>you won't get very far this way.
>>
>> > However the problem still remains if i cluster at the
>> > survey code (or household) level
>>
>>Is there a clickable hyperlink on the missing F-stat in this case, and if
>>so, what does it say?
>>
>>--Mark
>>
>>
>> > -MODEL 1 -
>> >
>> > reg y1 DiD vdc post season cdum2 cdum4, cluster(clust)
>> >
>> > Regression with robust standard errors                 Number of obs =
>> > 672
>> >                                                                       F(
>> > 1,
>> >      3) =       .
>> >
>>Prob
>> > >
>> > F      =       .
>> >
>> > R-squared     =  0.1220
>> > Number of clusters (village) = 4                           Root MSE
>>=
>> > .29762
>> >
>> >
>>------------------------------------------------------------------------------
>> >              |               Robust
>> >     cropfail |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
>> > Interval]
>> >
>>-------------+----------------------------------------------------------------
>> >          DiD |   .1867678   .0381533     4.90   0.016     .0653468
>> > .3081888
>> >    cdum1  |   .0407624   .0190767     2.14   0.122    -.0199481
>> > .1014729
>> >        post |   .0377531   .0255782     1.48   0.236    -.0436482
>> > .1191544
>> >       season |  -.0803571   .0418741    -1.92   0.151    -.2136192
>> > .0529049
>> >        cdum2 |   .0830587   5.54e-16        .   0.000     .0830587
>> > .0830587
>> >        cdum4 |    .085874   1.02e-15        .   0.000      .085874
>> > .085874
>> >        _cons |   .1601304   .0901628     1.78   0.174    -.1268078
>> > .4470686
>> >
>>------------------------------------------------------------------------------
>> >
>> >
>> > -MODEL 2 -
>> >
>> > reg y1 DiD vdc post season vdum2 vdum4, cluster(survey)
>> > Regression with robust standard errors                 Number of obs =
>> > 672
>> >                                                                       F(
>> > 5,
>> >    167) =       .
>> >
>>Prob
>> > >
>> > F      =       .
>> >
>> > R-squared     =  0.1220
>> > Number of clusters (survey) = 168                      Root MSE      =
>> > .29762
>> >
>> >
>>------------------------------------------------------------------------------
>> >              |               Robust
>> >     cropfail |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
>> > Interval]
>> >
>>-------------+----------------------------------------------------------------
>> >          DiD |   .1867678   .0788515     2.37   0.019     .0310936
>> > .342442
>> >    cdum1 |   .0407624    .012909     3.16   0.002     .0152765
>>.0662484
>> >         post |   .0377531   .0240521     1.57   0.118    -.0097322
>> > .0852384
>> >       season |  -.0803571   .0200387    -4.01   0.000     -.119919
>> > -.0407952
>> >        cdum2 |   .0830587   .0201067     4.13   0.000     .0433627
>> > .1227547
>> >        cdum4 |    .085874   .0476556     1.80   0.073     -.008211
>> > .179959
>> >        _cons |   .1601304   .0483279     3.31   0.001     .0647181
>> > .2555428
>> >
>>------------------------------------------------------------------------------
>> >
>> >
>> > -MODEL 3 -
>> > . reg y1 DiD vdc post season vdum2 vdum4, robust
>> >
>> > Regression with robust standard errors                 Number of obs =
>> > 672
>> >                                                        F(  6,   665) =
>> > 10.49
>> >                                                        Prob > F      =
>> > 0.0000
>> >                                                        R-squared     =
>> > 0.1220
>> >                                                        Root MSE      =
>> > .29762
>> >
>> >
>>------------------------------------------------------------------------------
>> >              |               Robust
>> >     cropfail |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
>> > Interval]
>> >
>>-------------+----------------------------------------------------------------
>> >          DiD |   .1867678   .0658962     2.83   0.005     .0573781
>> > .3161575
>> >     cdum1 |   .0407624   .0144458     2.82   0.005     .0123976
>> > .0691272
>> >         post |   .0377531   .0276749     1.36   0.173    -.0165876
>> > .0920938
>> >      season |  -.0803571   .0229621    -3.50   0.000    -.1254441
>> > -.0352702
>> >      cdum2 |   .0830587   .0206597     4.02   0.000     .0424926
>> > .1236247
>> >      cdum4 |    .085874   .0436286     1.97   0.049     .0002076
>> > .1715403
>> >        _cons |   .1601304   .0566039     2.83   0.005     .0489866
>> > .2712742
>> >
>>------------------------------------------------------------------------------
>> >
>> >
>> > Model 1 estimates the SEs at the cluster level, while Model 2 does it
>> at
>> > the
>> > ID level. Model 3 uses the robust option. and everything works out
>> fine.
>> > The
>> > help suggests that I may be estimating more parameters than i can
>>possible
>> > estimate with the data. I am not sure i see that since i have a sample
>>of
>> > over 670 observations, and I am estimating betwen 5 - 8 variable at
>>most.
>> >
>> > I was hoping someone has some intuition here as to what may be messing
>>me
>> > up.
>> >
>> > thanks.
>> > al
>> >
>> > _________________________________________________________________
>> > Express yourself instantly with MSN Messenger! Download today - it's
>>FREE!
>> > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>> >
>> > *
>> > *   For searches and help try:
>> > *   http://www.stata.com/support/faqs/res/findit.html
>> > *   http://www.stata.com/support/statalist/faq
>> > *   http://www.ats.ucla.edu/stat/stata/
>> >
>>
>>
>>Prof. Mark Schaffer
>>Director, CERT
>>Department of Economics
>>School of Management & Languages
>>Heriot-Watt University, Edinburgh EH14 4AS
>>tel +44-131-451-3494 / fax +44-131-451-3294
>>email: m.e.schaffer@hw.ac.uk
>>web: http://www.sml.hw.ac.uk/ecomes
>>
>
> _________________________________________________________________
> Donít just search. Find. Check out the new MSN Search!
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>
>


Prof. Mark Schaffer
Director, CERT
Department of Economics
School of Management & Languages
Heriot-Watt University, Edinburgh EH14 4AS
tel +44-131-451-3494 / fax +44-131-451-3294
email: m.e.schaffer@hw.ac.uk
web: http://www.sml.hw.ac.uk/ecomes



__________________________________________________________________

DISCLAIMER:

This e-mail message is subject to http://www.hw.ac.uk/disclaim.htm
__________________________________________________________________

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index