Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Lars12398@web.de |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Small sample with clustered data |

Date |
Wed, 30 Nov 2011 12:47:44 +0100 (CET) |

Thanks to you both for your help. @Mark I am not that happy with the 24 country subsample either because it consists of developed and developing countries for which calculations of the dependent variable are not identical. When I use a dummy variable doesn't that "control" for eventual differences between the two subsamples? I assume the alternative is doing a Chow test? Splitting the sample would be really problematic because of the seven independent variables and limited degrees of freedom. Best Lars -----Ursprüngliche Nachricht----- Von: "Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk> Gesendet: 28.11.2011 22:40:58 An: statalist@hsphsun2.harvard.edu Betreff: st: RE: Small sample with clustered data >Lars, > >A few thoughts... > >You say that the values of the dep var for the EU members are >correlated. But that's not necessarily a problem. What matters for the >VCE is the correlation of the error term u_i with u_j, or more >precisely, the correlation of x_i*u_i with x_j*u_j, where x_i is a >regressor. > >Say, for example, that the true DGP has an EU fixed effect, i.e., an EU >dummy belongs in the estimating equation. If you estimate without an EU >dummy, the dependence in the errors can mess up the classical or robust >VCE. The cluster-robust VCE would deal with this by, in effect, >creating an aggregated super-observation for the EU in the calculation >of the VCE. But a much simpler way of dealing with the problem is just >to include an EU dummy. It's just like estimating a panel data model >when you expect the observations for a panel unit (country, household, >whatever) to have errors that are correlated via a fixed effect. You >could use OLS and cluster-robust SEs, but using the LSDV estimator is >better, and might on its own be a perfectly satisfactory solution. > >A related thought: you have 24 non-EU countries and 26 EU countries. >You seem happy with the 24 non-EU sample, and presumably if you were to >estimate using just these 24, the only thing that would bother you would >be the small-ish sample size. How do you feel about estimating using a >sample of just the 26 EU countries? If you feel OK about that as well, >then perhaps your main concern should be about whether imposing the >common coefficients assumption for the combined sample of 50 is >warranted. > >As for the number of clusters issue, you have two problems. First, 25 >clusters isn't very many. The cluster-robust VCE gets its asymptotic >properties via the number of clusters going off to infinity, and 25 >isn't very far on the way. Second, Austin Nichols' has done some work >(I think cited in the 2007 presentation you mention) that shows that the >cluster-robust VCE doesn't work well with very unbalanced panels. >Knowing only what you've told use about your problem, I'd be reluctant >to recommend the cluster-robust VCE as the answer. Dealing with the >problem parametrically (e.g., with an EU dummy) seems like a better way >to go. > >HTH, >Mark > >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of >> Lars12398@web.de >> Sent: 28 November 2011 11:24 >> To: statalist@hsphsun2.harvard.edu >> Subject: st: Small sample with clustered data >> >> Dear Statalist, >> >> My sample consists of 50 countries with 26 of them being EU >> Member States. >> The problem is that the values of the dependent variable for >> the EU members are not independent of each other. Thus, I >> created a dummy variable "eucluster" that indicates if a >> country is in the EU (1=yes; 0=no) and used the >> cluster(eucluster) option after the OLS Regressions in Stata >> 10. However, in "Clustered Errors in Stata" >> (Nichols/Schaffer 2007 -http://repec.org/usug2007/crse.pdf) >> it is mentioned that if M, the number of clusters, is small >> matters could even get worse by using the cluster option (Sheet 20). >> M=50 seems to be the minimum number of clusters required. >> >> I have 24 clusters consisting of 1 country and 1 cluster >> comprising 26 EU members (6 independent variables). >> I do not know how to deal "correctly" with these clustered >> data in Stata. Hence, I would highly appreciate if someone >> could give me advice or suggest a solution on how to deal >> with the clustered data in such a small sample. >> >> Thanks for Consideration. >> >> Lars >> ___________________________________________________________ >> SMS schreiben mit WEB.DE FreeMail - einfach, schnell und >> kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192 >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > >-- >Heriot-Watt University is a Scottish charity >registered under charity number SC000278. > >Heriot-Watt University is the Sunday Times >Scottish University of the Year 2011-2012 > > > >* >* For searches and help try: >* http://www.stata.com/help.cgi?search >* http://www.stata.com/support/statalist/faq >* http://www.ats.ucla.edu/stat/stata/ ___________________________________________________________ SMS schreiben mit WEB.DE FreeMail - einfach, schnell und kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: RE: Small sample with clustered data***From:*"Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk>

- Prev by Date:
**RE: st: option problems with byhist (interlaced histogram)** - Next by Date:
**st: Rolling Means and Standard Deviations** - Previous by thread:
**st: capturing the sizes of the sequences of countinous (uninterrupted) values equal to 1** - Next by thread:
**RE: st: RE: Small sample with clustered data** - Index(es):