Re: st: When number of regressors greaterthan the number of clusters in OLS regression

 From Divya Balasubramaniam
To [email protected]
Subject Re: st: When number of regressors greaterthan the number of clusters in OLS regression
Date Mon, 1 Sep 2008 17:26:28 -0400 (EDT)

```I am still quite unclear exactly why I do not need to cluster by State at all? Can you kindly explain it one more time to me? Is it because that my dataset is not a sample but accounts for 100% of the population? Or is there something else I need to consider?

so instead of areg Y on X, absorb(state) robust cluster(state); I will now run areg Y on X, absorb(state) robust
correct?

Also can someone explain the inference of individual coefficients estimates when we encounter this kind of problem in case OLS regression (with lesser # cluster than the # regressors)

Thanks,
Divya.

>Divya-
>
>So, you have n = 436. Just remove State as a cluster variable and
>continue with your modeling. You won't be troubled by the limit on
>regressors again; just keep the number to <=44 (10% of observations).
>
>Good luck!
>
>-Steven Samuels
>
>
>> Hello Dr. Steven,
>>
>> My dependent variable is:share of total number of households in a
>> district having access to tap water. (I have the district totals)
>>
>> Divya.
>> =======================================
>> Divya Balasubramaniam
>> Economics PhD Student
>> University of Georgia
>> Athens -30602.
>>
>> Divya,
>> I reread your question and realize that you probably do not have
>> sample data at all. The Census of India was not a sample at all,
>> but, ideally, was a 100% enumeration. (Just as in other countries,
>> this will not be perfectly true.) So, I am not sure that you should
>> be clustering on State, or even on district, for that matter.
>> you have information on individual households or just district totals?
>>
>> Steven
>>
>>> More basic questions, Divya:  What is your target population:  the
>>> 17 states (of India, perhaps?) or the entire country?  Were the 17
>>> states selected from all states by a sampling process?  Or were
>>> they chosen in some other way--for example, because they had data
>>> available.  Are all districts from the selected states in your
>>> sample?
>>> On Sep 1, 2008, at 12:35 PM, Divya Balasubramaniam wrote:
>>>
>>>> Dear Dr.Schaffer,
>>>>
>>>> I am using clustering in my analysis and I am having some trouble
>>>> understanding some of the important issues. I have read several
>>>> papers you have written on clustering issues and hence I am
>>>> emailing you to seek help.
>>>>
>>>> I am doing a district level analysis for the census year 2001. I
>>>> have 436 districts in total coming from 17 States. I run an OLS
>>>> regression of Share of households having tap water access on
>>>> several controls variables (I have about 25 Regressors).  I use
>>>> the STATA command areg Y on X, absorb(State) cluster(state). I
>>>> have the state fixed effects and clustered by State.
>>>>
>>>> My question is: I have more regresors(25) than the number of
>>>> clusters(17). I also find in the STATA output that I have F-stat
>>>> missing. I would like to seek your advice on whether I can make
>>>> inference by looking at the individual coefficient estimates and
>>>> the reported robust Standard errors. I did see your comment on
>>>> this issue on the STATA listserv. However, I could not find
>>>> answers as to how to fix this problem of having more regressors
>>>> than the number of clusters.
>>>>
>>>> I will be extremely thankful if you can kindly help me in this
>>>> regard.
>>>> Sincerely,
>>>> Divya.
>>>> =======================================
>>>> Divya Balasubramaniam
>>>> Economics PhD Student
>>>> University of Georgia
>>>> Athens -30602.
