 From Maarten Buis To statalist@hsphsun2.harvard.edu Subject st: Re: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏ Date Fri, 5 Oct 2012 09:27:10 +0200

> Thank you. It will be quite complicated for me to understand this e-mail.
> Yes, in my data there is a mass at zero and I include all of them. So you are saying that it is a censoring problem and tobit regression is applicable or a fractional logit model?
> The other issue about Xs. The Xs that I am interested in have not been observed for non-innovator firms but there are other Xs that I use them as control variable have been observed for all firms in the sample.

Because the explanatory/independent/right-hand-side/x variables are
unobserved (missing) for those observations that do not innovate you
do not and cannot include those observations in your model. Even if
you tell Stata include those observations, Stata will automatically
ignore them (what else could it do?).

If you had valid and meaningful values on your explanatory variables
for those observations without innovation you could look at -zoib- for
such a problems (see -ssc desc zoib-). This won't satisfy everybody as
it assumes that the equation for the 0s and the equation for the
proportions between 0 and 1 are completely separate, it also assumes
that the exact 0s are all because of the process guiding 0s and not
due to the process leading to proportions. Trying to solve that would
involve huge identification problems, so I fear that that is in most
cases just not practical. In your case it won't work anyhow, as your
key explanatory variables are missing.

At some point it is just good to remember the following quote from
John Tukey (1986, p.74-75): "The combination of some data and an
aching desire for an answer does not ensure that a reasonable answer
can be extracted from a given body of data."

Hope this helps,
Maarten

John Tukey (1986), "Sunset salvo". The American Statistician 40(1):72-76.

