Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Truncated at zero count data with underdispersion

 From Laurie Molina To statalist@hsphsun2.harvard.edu Subject Re: st: Truncated at zero count data with underdispersion Date Mon, 11 Oct 2010 15:44:35 -0500

Thank you very much, i will work on your suggestion.
I just would like to ask for some comments on the following:
What do you think about a glm log gamma distribution?
With the log link i ensure that the conditional expectation is
positive, and i know i lose the posibility of predicting puntual
probabilities, but with the log gamma i can have underdispersion with
consistency, isnt it?

Thank you again!

On Mon, Oct 11, 2010 at 1:36 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
> "> Does anyone know any stata command that i could use to model zero
>> truncated count data with underdispersion?"
>
> There are too possibilities:
> or
> 2) The Poisson distribution doesn't fit your data-my best guess.
>
> If the Poisson model doesn't fit, use -mlogit- or -ologit-,  with
> categories being the numbers of cell phones.  You might have to
> combine sparse categories.  Since your goal is prediction in an
> external data set,  split the study data set into two parts; develop
> the model on one part, and assess the predictive accuracy of the model
> on the second.  (There are probably also -jackknife- or -boostrap-
> possibilities for getting cross-validated "honest" assessments of
> accuracy.)
>
> Here's an example of assessing predictive accuracy from -mlogit-. The
> predicted category is that with the highest probability, and
> predictive criterion is the difference between observed and predicted
> category and its root MSE.
>
> ***********CODE BEGINS*************
> sysuse auto, clear
> recode rep78 1/2 = 2
>  mlogit rep78 mpg trunk
>  forvalues i = 2/5{
>  predict p`i', outcome(`i')
>  }
>  egen pmax = rowmax(p2 p3 p4 p5)
>
> gen p_class = 2
> forvalues i =3/5{
> replace p_class = `i' if pmax ==p`i'
> }
> label var p_class "Predicted Category"
>
> gen diff = rep78 - p_class
> tab diff
> sum  diff
> scalar mse = r(var) + r(mean)^2
> di mse
>
> ***********CODE ENDS**************
>
>
>
>
> On Sun, Oct 10, 2010 at 6:39 PM, Laurie Molina <molinalaurie@gmail.com> wrote:
>> Hi all,
>>
>> I have a question, i hope somebody  can help my.
>>
>> I am modelling count data truncated at zero: The number of cell phones
>> of households with cells phone. The observed data goes from 1 to 9,
>> with mean equals 1.89 and variance 1.14.
>> I have done some underdispersion tests after running a poisson
>> regression with the truncated data and i reject the one sided
>> hypothesis of equidispersion with a p-value of cero. (the predicted
>> values have mean 1.89 with variance equal .51).
>>
>> Regarding the latent variable, i have also availabre the number of
>> cellphones of all the households, i.e. i have the data of the latent
>> variable that goes from 0 to 9. Here the mean equals 1.15 and the
>> variance equals 1.54. I have also done and underdispersion test after
>> running a poisson regresion with all the data and i get
>> underdispersion (the predicted values have mean 1.15 but variance
>> equal .999).
>>
>> I am interested in the truncated regresion because i want to predict
>> the number of cell phones of the HH who have cell phones. I mean, in
>> addition to the data that i am using for this regresion,  i have
>> another list of households and i know wheter they have or they dont
>> have a cell phone. But among the HH in that list, who do have a cell
>> phone, i do not know how many of them they have, and i am interested
>> in that.
>>
>> To my understand if i use a poisson regression, given that my data is
>> truncated i will get inconsistent estimates because the conditional
>> expectation will not be correctly specified as an exponential function
>> of xbeta.
>> So i have to use the command:
>> ****
>> ztp depvar indep var
>> ****
>> But if the latent variable is not poisson i will get inconsistent estimates.
>>
>> I know stata has also availabre the zero truncated negative binomial
>> regression, but since i get underdispersion in the latent variable i
>> think the data is not negative binomial distributed so i will still
>> get inconsistent estimates.
>>
>> Does anyone know any stata command that i could use to model zero
>> truncated count data with underdispersion?
>>
>> Thank you all very much in advance.
>>
>> Regards,
>>
>> Laurie.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/