[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Efficient way to run regressions with many dummy variables?

From   Pauline Grosjean <>
Subject   st: Efficient way to run regressions with many dummy variables?
Date   Mon, 27 Apr 2009 12:32:43 -0700 (PDT)

Dear Stata-list,

I am using a data set of 963,966 observations, with 26 variables (after
dropping all variables not needed for my estimation). The observations are
dyadic observations, I have in fact (1400 squared)/2 pairs of observations
 (divided by 2 because the relationship is non directional) and so in the
regressions, I need to control for 1400*2 dummy variables. I run a
regression of the form:
xi: reg y x1 x2 x3 i.observation1 i.observation2
where my dataset consists of dyadic relationships between each
observation1 and each observation2.

The problem I run into is that each regression takes an incredibly long
time (and the server crashes regularly).

In an alternative regression, I use Fafchamps and Gubert NGREG: I run:
xi: ngreg  y x1 x2 x3, id(observation1 observation2)
This also takes an incredibly long time.

My question is: Is there a more efficient way to run regressions in stata
with such an enormous amount of dummy variables?

PS: I do not care about the coefficient on the dummies per se.

Thank you very much in advance for your response.


Pauline Grosjean
Ciriacy Wantrup Fellow, Department of Agricultural and Resource Economics
University of California Berkeley
Web page:
Mobile: 510 384 0141

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index