# RE: st: When number of regressors greater than the number of clusters in OLS regression

 From "Schaffer, Mark E"
Date Mon, 1 Sep 2008 22:27:27 +0100

```Steven, Divya,

> Divya-
>
> So, you have n = 436. Just remove State as a cluster variable
> and continue with your modeling. You won't be troubled by the
> limit on regressors again; just keep the number to <=44 (10%
> of observations).

This is good advice if in fact there is no within-state correlation in
the disturbance.

Divya, is there any reason to suspect within-state correlation may
exist?

--Mark

>
> Good luck!
>
> -Steven Samuels
>
> On Sep 1, 2008, at 4:22 PM, Divya Balasubramaniam wrote:
>
> > Hello Dr. Steven,
> >
> > My dependent variable is:share of total number of households in a
> > district having access to tap water. (I have the district totals)
> >
> > Divya.
> > =======================================
> > Divya Balasubramaniam
> > Economics PhD Student
> > Terry College of Business
> > University of Georgia
> > Athens -30602.
> >
> > From: Steven Samuels <sjhsamuels@earthlink.net>
> > Date: September 1, 2008 4:13:40 PM EDT
> > To: statalist@hsphsun2.harvard.edu
> > Subject: Re: st: When number of regressors greater than the
> number of
> > clusters in OLS regression
> >
> >
> > Divya,
> > I reread your question and realize that you probably do not have
> > sample data at all. The Census of India was not a sample at
> all, but,
> > ideally, was a 100% enumeration. (Just as in other countries, this
> > will not be perfectly true.) So, I am not sure that you should be
> > clustering on State, or even on district, for that matter.
> example, do you
> > have information on individual households or just district totals?
> >
> > Regards,
> >
> > Steven
> >
> >
> > On Sep 1, 2008, at 1:05 PM, Steven Samuels wrote:
> >
> >> More basic questions, Divya:  What is your target population:  the
> >> 17 states (of India, perhaps?) or the entire country?  Were the 17
> >> states selected from all states by a sampling process?  Or
> were they
> >> chosen in some other way--for example, because they had data
> >> available.  Are all districts from the selected states in your
> >> sample?
> >>
> >>
> >> -Steven
> >> On Sep 1, 2008, at 12:35 PM, Divya Balasubramaniam wrote:
> >>
> >>> Dear Dr.Schaffer,
> >>>
> >>> I am using clustering in my analysis and I am having some trouble
> >>> understanding some of the important issues. I have read several
> >>> papers you have written on clustering issues and hence I
> am emailing
> >>> you to seek help.
> >>>
> >>> I am doing a district level analysis for the census year 2001. I
> >>> have 436 districts in total coming from 17 States. I run an OLS
> >>> regression of Share of households having tap water access
> on several
> >>> controls variables (I have about 25 Regressors).  I use the STATA
> >>> command areg Y on X, absorb(State) cluster(state). I have
> the state
> >>> fixed effects and clustered by State.
> >>>
> >>> My question is: I have more regresors(25) than the number of
> >>> clusters(17). I also find in the STATA output that I have F-stat
> >>> missing. I would like to seek your advice on whether I can make
> >>> inference by looking at the individual coefficient
> estimates and the
> >>> reported robust Standard errors. I did see your comment on this
> >>> issue on the STATA listserv. However, I could not find
> >>> how to fix this problem of having more regressors than
> the number of
> >>> clusters.
> >>>
> >>> I will be extremely thankful if you can kindly help me in this
> >>> regard.
> >>> Sincerely,
> >>> Divya.
> >>> =======================================
> >>> Divya Balasubramaniam
> >>> Economics PhD Student
> >>> Terry College of Business
> >>> University of Georgia
> >>> Athens -30602.
> >>
> >
> >
> >
>
>

```