Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Endogeneity and Panel Data : treatreg, ivregress or .. ? Any suggestion would be really appreciated !

From   John Litfiba <>
Subject   st: Endogeneity and Panel Data : treatreg, ivregress or .. ? Any suggestion would be really appreciated !
Date   Sun, 13 Nov 2011 15:41:46 +0100

Dear Stata List,
Dear Mark Schaffer (I guess ;-) )

I have a econometric question related to endogenous variables and
panel data, and I believe that it can be interesting for anyone who
uses longitudinal data.

Here's the context :

I have a panel dataset of individuals who, at any time t, could
endogenously chose the value of a variable E (for endogenous). E is
not ordered and could take few values (in my case, 6 possible

I am particularly interested in the effect of one of these choices on
a fully continuous outcome variable Y.

That is, at any time and for any individual I would like to estimate


where for example, Z is a binary variable that is equals to 1 if
individual i chooses E="the value of interest" at time t, and zero
otherwise. variables in X are assumed to be exogenous.
I believe I have a good instrument for Z, along for other control
demographic variables, and therefore I guess I have basically two
choices in order to take into account the panel nature of my dataset

1) using ivregress2 with the option cluster(id) and correcting for the
endogenous part with (Z= instrument + age + location of birth).
However Z is a dummy variable... I know this should not be a problem
2) using treatreg with the option vce(bootstrap, cluster(id)
reps(400)) and modeling the choice of E=2 (that is Z=1) with treat(Z=
instrument + age + location of birth)
3) I tried to use xtivreg 2 with fixed effects, but location of birth
is time invariant (and I believe very important in order to understand
Z) so it cannot be estimated.

Is my approach correct ? Do you have eventually other ways to tacke
this multiple choice endogenous problem ?

Moreover, in the context of panel data, do I always need to use
clustering on id in order to have correct standard errors ?
My dataset is large, but I have much more time variation than
clusters. About 200 000 individuals and 10 million observations for
the whole dataset.
The period where the instrument is available reduces the dataset
considerably : 1 million observations and about 20 000 individuals.
An important remark : the panel is NOT balanced. So individuals could
come in and out of the dataset during the 10 year period covered by my
dataset. Some have thus very few observations, and some have hundreds
of rows.

Many thanks in advance for your suggestions,

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index