Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Sequential Probit


From   <[email protected]>
To   <[email protected]>
Subject   st: Sequential Probit
Date   Fri, 4 Mar 2011 10:38:24 -0000

------------------------------

Date: Thu, 3 Mar 2011 09:21:31 +0000 (GMT)
From: Maarten buis <[email protected]>
Subject: Re: st: Sequential Probit

- --- On Thu, 3/3/11, Elin Vimefall wrote:
> I would like to use a sequential probit with three steps to
> analyze schooling of children:
> 
> Step 1: Do the child have  any formal education
> Step 2: Have the child finished primary education
> Step 3: Is the child currently in secondary education (or
> above)?
> 
> Does anyone know how to do this in stata?

Say you have an education variable, ed, where ed==1 when child 
has no formal education, ed==2 when child has finish primary
and stopped, and ed==3 when child is currenltly in secondary
education. Then you create two new variables:

gen byte ed12 = ed >= 2 if !missing(ed)
gen byte ed23 = ed == 3 if !missing(ed) & ed12 == 1

So ed12 is 1 when the child passed the first transition and 0
when it failed, and ed23 is 1 when the child passed the 
second transition, 0 when it failed and missing when it was
no longer "at risk", that is, the child failed the first 
transition.

Say you wanted to include the variables x1 and x2 as your
explanatory variable, then you would estimate a sequental
probit as follows:

probit ed12 x1 x2
probit ed23 x1 x2

[Warning: shameless self-promotion coming up]

Personally I prefer the sequential logit, as I find the
results easier to interpret, there is a nice decomposition
possible the relates the effects of x1 and x2 on the highest
achieved outcome to the effects of these variables during
each transition (I should think it is nice as I developed
it...), and I developed some tools to investigate the 
potential influence of unobserved variables in this model.

The decomposition is discussed here:
<http://www.maartenbuis.nl/dissertation/chap_6.pdf>

The tools for investigating the potential influence of 
unobserved heterogeneity are discussed here:
<http://www.maartenbuis.nl/publications/uh.html>

Both are implemented in the -seqlogit- package, which can 
be downloaded by typing in Stata -ssc install seqlogit-
 
> Is there any different between a sequential probit model
> and a multivariat probit with sample selection?

Yes, the latter model estimates more structure on the error 
terms. The problem with that is that there is obviously 
very little information in your data on that structure, so
model assumptions tend to be too important in such models 
for my taste, but tastes notoriously differ.

Hope this helps,
Maarten
============================================

Maarten gives wise advice about how to fit a sequential probit model,
e.g. in terms of how to set up the data in order to get the estimation
samples right. The same set-up could be used for the sequential logit.

Just to confirm Maarten's remark that tastes differ:
(i) I am less confident than he is that logit estimates are easier to
interpret than probit estimates. Much of this depends on whether you are
sure that you (and your target audience) understand what odds ratios
are. In my opinion, they are more poorly understood than most
quantitative sociologists hope or assume. It's not that probit parameter
estimates are easier to understand; rather I suggest working in the
probability metric. That is, look at the implications of the estimates
using marginal effects, average marginal effects or predicted
probabilities more generally (in Stata, think -margins-).
(ii) how to treat unobserved heterogeneity is of course difficult -- it
is unobserved!  A multivariate probit model with sample selection (cf.
Cappellari and Jenkins article in Stata Journal (2006), 6(2), free
download) is one way to proceed. The cost is the assumption of joint
normality (trivariate normal in the poster's case). 
(iii) This way of modelling the heterogeneity is conventional, but of
course the specification is a maintained assumption (as Maarten
stresses).  On the other hand, the implicit heterogeneity model that he
assumes in his own sequential logit package is unclear to me from his
paper. The model that he implicitly tests against is also a maintained
assumption. (I think it's a single factor model -- i.e. with the latent
errors perfectly correlated and with a Normal marginal distribution. No
doubt Maarten can correct me.)
(iv) Whatever, it is now relatively straightforward to explore in Stata
what happens using either approach. That sort of robustness checking is
useful.

Stephen
-------------------------------------
Professor Stephen P. Jenkins  <[email protected]>
Department of Social Policy and STICERD
London School of Economics and Political Science
Houghton Street
London WC2A 2AE, U.K.
Tel. +44 (0)20 7955 6527
Survival Analysis using Stata:
http://www.iser.essex.ac.uk/survival-analysis
Downloadable papers and software: http://ideas.repec.org/e/pje7.html

Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index