Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: When number of regressors greater than the number of clusters in OLS regression


From   "Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: When number of regressors greater than the number of clusters in OLS regression
Date   Mon, 1 Sep 2008 23:00:12 +0100

Divya,

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu 
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of 
> Divya Balasubramaniam
> Sent: 01 September 2008 22:26
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: When number of regressors greater than the 
> number of clusters in OLS regression
> 
> I am still quite unclear exactly why I do not need to cluster 
> by State at all? Can you kindly explain it one more time to 
> me?

Whether or not you need to use cluster-robust depends on whether you
think your data have a problem that cluster-robust can address, namely
(1) the error terms in your equation are correlated within states
because of unobserved heterogeneity (so the iid assumption fails), but
(2) the error terms are not correlated across states.

A good example would be whether you are looking at something that is
affected by state-level regulation, i.e., the laws regulating it vary
from state to state, but you don't have variables that control for this
somehow.

> Is it because that my dataset is not a sample but 
> accounts for 100% of the population? Or is there something 
> else I need to consider?
> 
> so instead of areg Y on X, absorb(state) robust 
> cluster(state); I will now run areg Y on X, absorb(state) 
> robust correct?

Two comments here: first, you are probably better off using xtreg
instead of areg (more modern Stata command); and second and more
important, Stock and Watson (Econometrica 2007, if I recall correctly)
have shown that the fixed effects estimator with -robust- generates SEs
that are not correct.

> Also can someone explain the inference of individual 
> coefficients estimates when we encounter this kind of problem 
> in case OLS regression (with lesser # cluster than the # regressors)

See my previous email and the references therein - I think they're
pretty clear, especially the one by Vince Wiggins.

Cheers,
Mark

Prof. Mark Schaffer
Director, CERT
Department of Economics
School of Management & Languages
Heriot-Watt University
Edinburgh EH14 4AS
tel +44-131-451-3494 / fax +44-131-451-3296
http://ideas.repec.org/e/psc51.html


> Thanks,
> Divya. 
> 
> ---- Original message ----
> >Date: Mon, 1 Sep 2008 16:59:59 -0400
> >From: Steven Samuels <sjhsamuels@earthlink.net>
> >Subject: Re: st: When number of regressors greater than the 
> number of 
> >clusters in OLS regression
> >To: statalist@hsphsun2.harvard.edu
> >
> >Divya-
> >
> >So, you have n = 436. Just remove State as a cluster variable and 
> >continue with your modeling. You won't be troubled by the limit on 
> >regressors again; just keep the number to <=44 (10% of observations).
> >
> >Good luck!
> >
> >-Steven Samuels
> >
> >On Sep 1, 2008, at 4:22 PM, Divya Balasubramaniam wrote:
> >
> >> Hello Dr. Steven,
> >>
> >> My dependent variable is:share of total number of households in a 
> >> district having access to tap water. (I have the district totals)
> >>
> >> Divya.
> >> =======================================
> >> Divya Balasubramaniam
> >> Economics PhD Student
> >> Terry College of Business
> >> University of Georgia
> >> Athens -30602.
> >>
> >> From: Steven Samuels <sjhsamuels@earthlink.net>
> >> Date: September 1, 2008 4:13:40 PM EDT
> >> To: statalist@hsphsun2.harvard.edu
> >> Subject: Re: st: When number of regressors greater than 
> the number of 
> >> clusters in OLS regression
> >> Reply-To: statalist@hsphsun2.harvard.edu
> >>
> >>
> >> Divya,
> >> I reread your question and realize that you probably do not have 
> >> sample data at all. The Census of India was not a sample 
> at all, but, 
> >> ideally, was a 100% enumeration. (Just as in other countries, this 
> >> will not be perfectly true.) So, I am not sure that you should be 
> >> clustering on State, or even on district, for that matter.
> >> Please reply with details about your observations. For example, do 
> >> you have information on individual households or just 
> district totals?
> >>
> >> Regards,
> >>
> >> Steven
> >>
> >>
> >> On Sep 1, 2008, at 1:05 PM, Steven Samuels wrote:
> >>
> >>> More basic questions, Divya:  What is your target population:  the
> >>> 17 states (of India, perhaps?) or the entire country?  
> Were the 17 
> >>> states selected from all states by a sampling process?  
> Or were they 
> >>> chosen in some other way--for example, because they had data 
> >>> available.  Are all districts from the selected states in your 
> >>> sample?
> >>>
> >>>
> >>> -Steven
> >>> On Sep 1, 2008, at 12:35 PM, Divya Balasubramaniam wrote:
> >>>
> >>>> Dear Dr.Schaffer,
> >>>>
> >>>> I am using clustering in my analysis and I am having 
> some trouble 
> >>>> understanding some of the important issues. I have read several 
> >>>> papers you have written on clustering issues and hence I am 
> >>>> emailing you to seek help.
> >>>>
> >>>> I am doing a district level analysis for the census year 2001. I 
> >>>> have 436 districts in total coming from 17 States. I run an OLS 
> >>>> regression of Share of households having tap water access on 
> >>>> several controls variables (I have about 25 Regressors). 
>  I use the 
> >>>> STATA command areg Y on X, absorb(State) cluster(state). 
> I have the 
> >>>> state fixed effects and clustered by State.
> >>>>
> >>>> My question is: I have more regresors(25) than the number of 
> >>>> clusters(17). I also find in the STATA output that I have F-stat 
> >>>> missing. I would like to seek your advice on whether I can make 
> >>>> inference by looking at the individual coefficient estimates and 
> >>>> the reported robust Standard errors. I did see your 
> comment on this 
> >>>> issue on the STATA listserv. However, I could not find 
> answers as 
> >>>> to how to fix this problem of having more regressors than the 
> >>>> number of clusters.
> >>>>
> >>>> I will be extremely thankful if you can kindly help me in this 
> >>>> regard.
> >>>> Sincerely,
> >>>> Divya.
> >>>> =======================================
> >>>> Divya Balasubramaniam
> >>>> Economics PhD Student
> >>>> Terry College of Business
> >>>> University of Georgia
> >>>> Athens -30602.
> >>>> *
> >>>> *   For searches and help try:
> >>>> *   http://www.stata.com/help.cgi?search
> >>>> *   http://www.stata.com/support/statalist/faq
> >>>> *   http://www.ats.ucla.edu/stat/stata/
> >>>
> >>> *
> >>> *   For searches and help try:
> >>> *   http://www.stata.com/help.cgi?search
> >>> *   http://www.stata.com/support/statalist/faq
> >>> *   http://www.ats.ucla.edu/stat/stata/
> >>
> >> *
> >> *   For searches and help try:
> >> *   http://www.stata.com/help.cgi?search
> >> *   http://www.stata.com/support/statalist/faq
> >> *   http://www.ats.ucla.edu/stat/stata/
> >>
> >>
> >
> >*
> >*   For searches and help try:
> >*   http://www.stata.com/help.cgi?search
> >*   http://www.stata.com/support/statalist/faq
> >*   http://www.ats.ucla.edu/stat/stata/
> =======================================
> Divya Balasubramaniam
> Economics PhD Student
> Terry College of Business
> University of Georgia
> Athens -30602.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index