Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
John Litfiba <cariboupad@gmx.fr> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Endogeneity and Panel Data : treatreg, ivregress or .. ? Any suggestion would be really appreciated ! |

Date |
Sun, 13 Nov 2011 15:41:46 +0100 |

Dear Stata List, Dear Mark Schaffer (I guess ;-) ) I have a econometric question related to endogenous variables and panel data, and I believe that it can be interesting for anyone who uses longitudinal data. Here's the context : I have a panel dataset of individuals who, at any time t, could endogenously chose the value of a variable E (for endogenous). E is not ordered and could take few values (in my case, 6 possible choices). I am particularly interested in the effect of one of these choices on a fully continuous outcome variable Y. That is, at any time and for any individual I would like to estimate Yit=a+bXit+cZit+eit where for example, Z is a binary variable that is equals to 1 if individual i chooses E="the value of interest" at time t, and zero otherwise. variables in X are assumed to be exogenous. I believe I have a good instrument for Z, along for other control demographic variables, and therefore I guess I have basically two choices in order to take into account the panel nature of my dataset 1) using ivregress2 with the option cluster(id) and correcting for the endogenous part with (Z= instrument + age + location of birth). However Z is a dummy variable... I know this should not be a problem but... 2) using treatreg with the option vce(bootstrap, cluster(id) reps(400)) and modeling the choice of E=2 (that is Z=1) with treat(Z= instrument + age + location of birth) 3) I tried to use xtivreg 2 with fixed effects, but location of birth is time invariant (and I believe very important in order to understand Z) so it cannot be estimated. Is my approach correct ? Do you have eventually other ways to tacke this multiple choice endogenous problem ? Moreover, in the context of panel data, do I always need to use clustering on id in order to have correct standard errors ? My dataset is large, but I have much more time variation than clusters. About 200 000 individuals and 10 million observations for the whole dataset. The period where the instrument is available reduces the dataset considerably : 1 million observations and about 20 000 individuals. An important remark : the panel is NOT balanced. So individuals could come in and out of the dataset during the 10 year period covered by my dataset. Some have thus very few observations, and some have hundreds of rows. Many thanks in advance for your suggestions, Best, * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Endogeneity and Panel Data : treatreg, ivregress or .. ? Any suggestion would be really appreciated !***From:*Cameron McIntosh <cnm100@hotmail.com>

- Prev by Date:
**st: Re: Challenging foreach problem** - Next by Date:
**RE: reL Re: st: Interpreting mediation model sobel goodman test** - Previous by thread:
**st: Re: Challenging foreach problem** - Next by thread:
**RE: st: Endogeneity and Panel Data : treatreg, ivregress or .. ? Any suggestion would be really appreciated !** - Index(es):