Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Using 2 stage Heckmen Sample Selection with Lags in STATA


From   "Seema Bhatia" <ler02sb@reading.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Using 2 stage Heckmen Sample Selection with Lags in STATA
Date   Thu, 2 Aug 2007 16:17:32 +0100

Dear Austin

To answer you questions on non random data - I did ANOVA for the data and
that revealed that this pattern is non random. I know that endogeneity is a
problem, however, for me missing values is a greater problem and therefore
endogeniety has taken a back seat - there is enough complication within my
dataset to be able to deal a few problems at a time. Therefore although I
agree with you on the use of IV, it is the Heckman Sample Selection that is
priority. To be honest, how many pure exogenous instruments are out there
anyway? There is a lot of debate in trade literature about these models and
the current methodology seems to be best suited for the analysis.

It has been suggested to me that I use lags in order to take care of the
endogeniety issue but right now I will leave that a while too while I work
out other issues.

As far as the Hausman test goes, I used the same model that I intend to use
in my analysis (bilateral trade as a function of gdps, populations,
distance, contiguity, membership of agreements, infrastructure indices etc)
and tested for fixed effects, then random effects. therefore

xtreg dependentvar independentvar1 independentvar2 independentvar3 ... , fe
then
xtreg dependentvar independentvar1 independentvar2 independentvar3 ... , re

having stored both these, i went on to do the hausman test and got an
insignificant P-value, therefore justifying the use of random effects model.

Am not familiar with the ssc inst mim or a lot of statistical jargon to be
honest so wouldn't have a clue!

thanks

seema


> Seema--
> I am not familiar with the "standard gravity model" though I suspect
> it models trade as a function of the reciprocal of the square of
> distance along an ellipsoid between the centroids of two countries,
> among other things.  This seems inappropriate for various reasons
> (even if using production- or population-weighted centroids, the
> relevant distance is not as the crow flies--e.g. contiguity is likely
> more important, as are topographical features, and historical
> relationships/religion/language/etc. even more important) though if
> the model is "standard" I suppose am unlikely to talk you out of it.
>
> For the missing data, have you considered multiple imputation (-ssc
> inst mim- etc.)?  Is the pattern of missing data nonrandom according
> to some functional relationship you know something about, or do you
> characterize as nonrandom because some countries have more missings
> than others? Note that -mim- supports -xtpoisson- among other
> commands.
>
> In what sense does a Hausman test require you to use a random effects
> model? What were your commands and output for the Hausman test? Note
> also you may want a Hausman test robust to serial correlation:
> http://www.stata.com/statalist/archive/2004-08/msg00548.html
> though if you are focusing on sub-Saharan Africa you may not have
> enough countries to claim a cluster-robust estimator is justified
> (asymptotic in number of clusters; 50 clusters is enough for most
> purposes).
>
> It seems to me that the endogeneity issue is bigger than the missing
> data or specification issues.  Countries that trade more will produce
> more output, etc. This is an even bigger problem in a dynamic
> setting--how do empirical studies on trade deal with this? Perhaps you
> need -xtivreg2- (ssc inst xtivreg2) and some valid instruments?
>
> On 8/2/07, Seema Bhatia <ler02sb@reading.ac.uk> wrote:
>> Hi Austin
>>
>> Thanks for your input.
>>
>> In my case, bilateral trade between a country pair (Xij) is measured as
>> import value (CIF import values in US$ deflated by the appropriate price
>> deflator) i.e. exports from country j are imports into country i as it is
>> done in a standard gravity model. this bilateral trade in sub Saharan
>> Africa
>> is being modelled as a function of gdps, populations, distance,
>> landlockedness, contiguity, some calculated infrastructure indices and
>> regional trade groups etc. in order to study the impact of trade
>> agreements
>> within the region.
>>
>> In terms of the missing data, I am lucky to have data for a lot of the
>> country pairs since these are all in Sub Saharan Africa where data is a
>> huge
>> issue, particularly those that are war torn and in the early years of my
>> analysis (1985-1994). I have scouted all possible datasources and
>> compiled
>> the import values where missing.
>>
>> There are two issues with the missing data. There are both zeroes or no
>> data
>> available for several country pairs over the years. Since my major data
>> source has made that distinction specifically, I have decided that zero
>> will
>> be an answer (meaning there was zero trade) while I still need to account
>> for the missing values/not reported (meaning we dont know if there was
>> trade
>> or not) data - the pattern for this is non random which is why I am keen
>> on
>> using a sample selection bias correction method to account for it.
>>
>> A Hausman test has also revealed that I am required to fit a random
>> effects
>> model for my analysis. So I was hoping that I could do this within the
>> Panel-Heckman setting. Am exploring gllamm at the moment but am still
>> quite
>> unclear on how to go about it as econometrics is not really my forte.
>>
>> Please let me know if I have given you enough details here, as a typical
>> PhD
>> wannabe, I usually tend to think everyone knows what i am talking about.
>>
>> Many thanks
>>
>> Seema
>>
>>> Seema--
>>> I think I already answered the question about the -heckman- approach
>>> likely being hard in your panel data, or perhaps impossible without a
>>> significant investment in learning -gllamm- on your part, or perhaps
>>> writing new routines.  But I still don't see necessary detail in your
>>> description of the data--how are you measuring trade?  Volume of
>>> exports+imports?  Net exports?  Indicator for any trade at all? Why is
>>> there missing data as opposed to just zeros for your LHS var?  If
>>> you've got trade measured as a strictly positive variable and missings
>>> where no trade exists, then you can replace trade=0 if mi(trade) and
>>> run xtpoisson with country pair fixed effects, right?
>>>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index