Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Small sample with clustered data

From	[email protected]
To	[email protected]
Subject	Re: st: RE: Small sample with clustered data
Date	Wed, 30 Nov 2011 12:47:44 +0100 (CET)

Thanks to you both for your help.

@Mark

I am not that happy with the 24 country subsample either because it consists
of developed and developing countries for which calculations of the
dependent variable are not identical. When I use a dummy variable doesn't
that "control" for eventual differences between the two subsamples? I assume the
alternative is doing a Chow test? Splitting the sample would be really problematic because
of the seven independent variables and limited degrees of freedom.


Best

Lars

-----Ursprüngliche Nachricht-----
Von: "Schaffer, Mark E" <[email protected]>
Gesendet: 28.11.2011 22:40:58
An: [email protected]
Betreff: st: RE: Small sample with clustered data

>Lars,
>
>A few thoughts...
>
>You say that the values of the dep var for the EU members are
>correlated. But that's not necessarily a problem. What matters for the
>VCE is the correlation of the error term u_i with u_j, or more
>precisely, the correlation of x_i*u_i with x_j*u_j, where x_i is a
>regressor.
>
>Say, for example, that the true DGP has an EU fixed effect, i.e., an EU
>dummy belongs in the estimating equation. If you estimate without an EU
>dummy, the dependence in the errors can mess up the classical or robust
>VCE. The cluster-robust VCE would deal with this by, in effect,
>creating an aggregated super-observation for the EU in the calculation
>of the VCE. But a much simpler way of dealing with the problem is just
>to include an EU dummy. It's just like estimating a panel data model
>when you expect the observations for a panel unit (country, household,
>whatever) to have errors that are correlated via a fixed effect. You
>could use OLS and cluster-robust SEs, but using the LSDV estimator is
>better, and might on its own be a perfectly satisfactory solution.
>
>A related thought: you have 24 non-EU countries and 26 EU countries.
>You seem happy with the 24 non-EU sample, and presumably if you were to
>estimate using just these 24, the only thing that would bother you would
>be the small-ish sample size. How do you feel about estimating using a
>sample of just the 26 EU countries? If you feel OK about that as well,
>then perhaps your main concern should be about whether imposing the
>common coefficients assumption for the combined sample of 50 is
>warranted.
>
>As for the number of clusters issue, you have two problems. First, 25
>clusters isn't very many. The cluster-robust VCE gets its asymptotic
>properties via the number of clusters going off to infinity, and 25
>isn't very far on the way. Second, Austin Nichols' has done some work
>(I think cited in the 2007 presentation you mention) that shows that the
>cluster-robust VCE doesn't work well with very unbalanced panels.
>Knowing only what you've told use about your problem, I'd be reluctant
>to recommend the cluster-robust VCE as the answer. Dealing with the
>problem parametrically (e.g., with an EU dummy) seems like a better way
>to go.
>
>HTH,
>Mark
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of
>> [email protected]
>> Sent: 28 November 2011 11:24
>> To: [email protected]
>> Subject: st: Small sample with clustered data
>>
>> Dear Statalist,
>>
>> My sample consists of 50 countries with 26 of them being EU
>> Member States.
>> The problem is that the values of the dependent variable for
>> the EU members are not independent of each other. Thus, I
>> created a dummy variable "eucluster" that indicates if a
>> country is in the EU (1=yes; 0=no) and used the
>> cluster(eucluster) option after the OLS Regressions in Stata
>> 10. However, in "Clustered Errors in Stata"
>> (Nichols/Schaffer 2007 -http://repec.org/usug2007/crse.pdf)
>> it is mentioned that if M, the number of clusters, is small
>> matters could even get worse by using the cluster option (Sheet 20).
>> M=50 seems to be the minimum number of clusters required.
>>
>> I have 24 clusters consisting of 1 country and 1 cluster
>> comprising 26 EU members (6 independent variables).
>> I do not know how to deal "correctly" with these clustered
>> data in Stata. Hence, I would highly appreciate if someone
>> could give me advice or suggest a solution on how to deal
>> with the clustered data in such a small sample.
>>
>> Thanks for Consideration.
>>
>> Lars
>> ___________________________________________________________
>> SMS schreiben mit WEB.DE FreeMail - einfach, schnell und
>> kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
>--
>Heriot-Watt University is a Scottish charity
>registered under charity number SC000278.
>
>Heriot-Watt University is the Sunday Times
>Scottish University of the Year 2011-2012
>
>
>
>*
>* For searches and help try:
>* http://www.stata.com/help.cgi?search
>* http://www.stata.com/support/statalist/faq
>* http://www.ats.ucla.edu/stat/stata/


___________________________________________________________
SMS schreiben mit WEB.DE FreeMail - einfach, schnell und
kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: RE: Small sample with clustered data
  - From: "Schaffer, Mark E" <[email protected]>

Prev by Date: RE: st: option problems with byhist (interlaced histogram)
Next by Date: st: Rolling Means and Standard Deviations
Previous by thread: st: capturing the sizes of the sequences of countinous (uninterrupted) values equal to 1
Next by thread: RE: st: RE: Small sample with clustered data
Index(es):
- Date
- Thread