Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Count models and fractional variables |

Date |
Mon, 19 Mar 2012 10:28:55 -0400 |

J.M.C. Santos Silva <jmcss@essex.ac.uk>: While others differ, Stata convention is that truncation of y means the values are not observed if y<a (as in -truncreg-) but censoring means that y is measured as a if y<a (as in -tobit-). In the first case, we do not have X either, so we cannot estimate a tobit using censored obs. I agree with your advice not to use the tobit or a zero inflated model, but only on the grounds that they require (typically) untenable distributional assumptions for consistent estimates, whereas a -glm- needs only the functional form for the conditional mean. However, it is possible that there is a form of censoring in such data. It is true that we would expect the conditional mean of a count outcome such as patents (or an arrival rate of patents) to be nonzero given any covariates X, even if infinitely small, but it need not be; it can be exactly zero, if for example some group of inventors is ineligible to file for patents. If categorical ineligibility derives from a binary (latent) characteristic, perhaps as in a logit or probit model, then a zero inflation process is at work; if the ineligibility derives from a cutoff score on some continuous (latent) characteristic, then perhaps a censoring model is in order (but probably still not a -tobit-). Just to reiterate: I agree -glm- or an equivalent count model is preferable in general for the application described, or perhaps a discrete-time hazard model, but for some applications a zero-inflated model may be preferable; without knowing much more about the data and the data generating process (i.e. institutional settting) is hard to know for sure. Fabiana and Stefano H. Baruffaldi <stefano.baruffaldi@epfl.ch>: Are there many inventors who never get a patent? Do you believe there are inventors with a zero conditional mean in patents per year? If so, is there a known reason for that property? Do you measure the reasons for the zero/nonzero conditional mean? If you can estimate a logit that perfectly predicts zero/nonzero total patents, perhaps you should just use -glm- on the group with nonzero total patents. On Sat, Mar 17, 2012 at 12:02 PM, Santos Silva, J.M.C. <jmcss@essex.ac.uk> wrote: > Dear Fabiana, > > Sorry for not seeing you post earlier. Let me see if I can clarify this. > > First, your friend should not use the Tobit as it is meant for > truncated data and there is no truncation in this dataset. > > Second, the ZI models can be estimated even if the dependent > variable is continuous. So, there is no need to round the data > and of course you get different results if you do. > > The fact that you can estimate zero inflated models with > continuous data does not mean that it is a good idea to do it! > In particular, the results of zero inflated models are not invariant > to the scale of the dependent variable, and that explains why > different results are obtained if it is multiplied by 10000. > > The reason for this is that by rescaling the variable you change > the amount of overdispersion (the mean is multiplied by k > and the variance by k^2). Therefore, studying the number of > patents per year of by quarter will give different results! > > The advice is now obvious: go back to modeling the counts > using an appropriate count data model (is zero inflation really > needed?). In general, my advice is that one should model the > variable we care about and not some transformation of it; as this > example illustrates, messing with the dependent variable may > have very undesirable consequences. > > All the best, > > Joao * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: Count models and fractional variables***From:*"Santos Silva, J.M.C." <jmcss@essex.ac.uk>

- Prev by Date:
**st: Query regarding formula for calculating the conventional variance for the xtgee command** - Next by Date:
**RE: st: meta-regression - fitting quadratics** - Previous by thread:
**RE: st: Count models and fractional variables** - Next by thread:
**st: Testing the validity of instruments when estimating a GMM model with Windmeijer corrected standard errors** - Index(es):