Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Count models and fractional variables

From   Austin Nichols <>
Subject   Re: st: Count models and fractional variables
Date   Mon, 19 Mar 2012 10:28:55 -0400

J.M.C. Santos Silva <>:

While others differ, Stata convention is that truncation of y means
the values are not observed if y<a (as in -truncreg-) but censoring
means that y is measured as a if y<a (as in -tobit-). In the first
case, we do not have X either, so we cannot estimate a tobit using
censored obs.

I agree with your advice not to use the tobit or a zero inflated
model, but only on the grounds that they require (typically) untenable
distributional assumptions for consistent estimates, whereas a -glm-
needs only the functional form for the conditional mean.

However, it is possible that there is a form of censoring in such
data.  It is true that we would expect the conditional mean of a count
outcome such as patents (or an arrival rate of patents) to be nonzero
given any covariates X, even if infinitely small, but it need not be;
it can be exactly zero, if for example some group of inventors is
ineligible to file for patents.

If categorical ineligibility derives from a binary (latent)
characteristic, perhaps as in a logit or probit model, then a zero
inflation process is at work; if the ineligibility derives from a
cutoff score on some continuous (latent) characteristic, then perhaps
a censoring model is in order (but probably still not a -tobit-).

Just to reiterate: I agree -glm- or an equivalent count model is
preferable in general for the application described, or perhaps a
discrete-time hazard model, but for some applications a zero-inflated
model may be preferable; without knowing much more about the data and
the data generating process (i.e. institutional settting) is hard to
know for sure.

Fabiana and Stefano H. Baruffaldi <>:

Are there many inventors who never get a patent?

Do you believe there are inventors with a zero conditional mean in
patents per year?

If so, is there a known reason for that property?

Do you measure the reasons for the zero/nonzero conditional mean?

If you can estimate a logit that perfectly predicts zero/nonzero total
patents, perhaps you should just use -glm- on the group with nonzero
total patents.

On Sat, Mar 17, 2012 at 12:02 PM, Santos Silva, J.M.C.
<> wrote:
> Dear Fabiana,
> Sorry for not seeing you post earlier. Let me see if I can clarify this.
> First, your friend should not use the Tobit as it is meant for
> truncated data and there is no truncation in this dataset.
> Second, the ZI models can be estimated even if the dependent
> variable is continuous. So, there is no need to round the data
> and of course you get different results if you do.
> The fact that you can estimate zero inflated models with
> continuous data does not mean that it is a good idea to do it!
> In particular, the results of zero inflated models are not invariant
> to the scale of the dependent variable, and that explains why
> different results are obtained if it is multiplied by 10000.
> The reason for this is that by rescaling the variable you change
> the amount of overdispersion (the mean is multiplied by k
> and the variance by k^2). Therefore, studying the number of
> patents per year of by quarter will give different results!
> The advice is now obvious: go back to modeling the counts
> using an appropriate count data model (is zero inflation really
> needed?). In general, my advice is that one should model the
> variable we care about and not some transformation of it; as this
> example illustrates, messing with the dependent variable may
> have very undesirable consequences.
> All the best,
> Joao
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index