Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to model a positive continuous dependent variable with many zeros?


From   Steven Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to model a positive continuous dependent variable with many zeros?
Date   Thu, 2 Jun 2011 12:47:39 -0400

Ah.. you are asking about the combination.  The expected duration Y for a person with covariates X is:

 E(Y|X) = P(Y>0|X)*E(Y|Y>0,X) 

Where the P is from the logistic (or other) binary model and the expected value is from the survival model.  However you have multiple episodes per-person, so that a single two-part model will not suffice.  As you are really interested in the proportion of total time spent in seclusion, consider analyzing just that proportion directly. See Kit Baum's Stata Journal tip at http://www.scribd.com/doc/55505304/61/Stata-tip-63-Modeling-proportions.


Steve
sjsamuels@gmail.com



On Wed, Jun 1, 2011 at 2:38 AM, Steve Samuels wrote

These are known as "two-part" or "hurdle" models, and a google search will find hundreds of references.
On Wed, Jun 1, 2011 at 2:38 AM, Adriaan Hoogendoorn


<aw.hoogendoorn@gmail.com> wrote:


Adriaan wrote:


Thank you, Hithesh (and Maarten in a previous post), for your help.
Your help is highly appreciated.

The situation Maarten described appears exactly to be the case:
Clinic staff members try reducing total seclusion durations (at the
clinic level) by ending seclusions as soon as possible at the risk
of introducing more seclusion episodes. Total seclusion duration
(rated against the total time spent in the clinic) seems the
appropriate quantity to evaluate seclusion policies. We find that
total seclusion durations differ substantially across clinics. The
explanation clinics give for having higher total seclusion durations
than other clinics is that they claim to have “harder” patients, as
Maarten suggested.

Explaining these differences from patient characteristics (and some
clinic characteristics) is exactly what this study is about.

Your suggestion of combining the modeled zeros (from a logistic
regression, or from the Poisson as Maarten suggested) with a model for
non-zero duration (from GLM or Survival Analysis) seems very attractive.
However, I have no experience on how to do this. Do you mean: after
modeling the zeros, model the non-zeros by deleting the zeros from
the data set using the same predictors?

This would provide me with two sets of parameters. Do you think I can
use these two sets of model parameters
to estimate the total seclusion
duration for a given ward with a given set of patients?

I’ve never seen such a combined model in scientific literature – which
may well be my mistake. Do you have any references how such a combination
was applied and discussed?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index