# Re: st: What multiple regression model for extreme distributions

 From David Greenberg To statalist@hsphsun2.harvard.edu Subject Re: st: What multiple regression model for extreme distributions Date Tue, 02 Feb 2010 20:09:19 -0500

```Poisson and negative binomial regressions, along with their zero-inflated versions, are models for counts, not for levels of a continuous variable. That makes me think their use for this problem is dubious. Something on the other of a Tobit might be more appropriate. David Greenberg, Sociology Department, New York University

----- Original Message -----
From: muhammed abdul khalid <muhammed.abdulkhalid@gmail.com>
Date: Tuesday, February 2, 2010 3:08 pm
Subject: Re: st: What multiple regression model for extreme distributions
To: statalist@hsphsun2.harvard.edu

> Hi,
> Thank you for the replies.
>
> The data is cross sectional, and saving is simply measured based on
> respondents answer on how much saving they have ( in dollars) with the
> minimum being zero. There is no negative saving. Yes, saving is my
> dependent variable.
>
> I tried logit, zip, zinb, nbreg  but their std error varies greatly.
> Still unsure to what model should be used. My objective is to predict
> the contribution of education, gender, location and  ethnicity to
> saving of the household.
>
> Thank you again for kind response.
>
> Muhammed
> SciencesPo Paris.
>
>
>
>
>
>
> 2010/2/2 Austin Nichols <austinnichols@gmail.com>:
> > You have had a number of good suggestions already, but as Nick Cox
> > points out, the distribution of the dependent variable is not all that
> > relevant to what model you choose; it is the distribution of the
> > dependent variable conditional on explanatory variables that is
> > important.  Before you estimate a two-part "hurdle" or zero-inflated
> > model, I urge you to consider that the right set of explanatory
> > variables might well capture the reason for a large number of zero
> > outcomes (e.g. using -poisson- instead of -zip- etc.).  When it comes
> > to household saving (I think that is your dependent variable, not
> > independent), you also want to consider debt.  It may be the case that
> > households you are coding as zeros actually have negative saving
> > during the period under study.  Do you have panel data, or
> > cross-sectional data?  How is saving measured?
> >
> > On Tue, Feb 2, 2010 at 10:09 AM, <muhammed.abdulkhalid@gmail.com> wrote:
> >> I have a household income survey data ( 38,000 observations), and my
> >> problem is doing a multiple regression on saving ( independent var)
> to
> >> ethnicity/strata/employment
> >> etc( dependent var).
> >>
> >> The problem is this : 70% of my observation for the value of saving
> is
> >> zero. I had recode it to 1 and log them, but the distribution is still
> >> extremely skewed ( mean 0.78, std dev is 2.4  min 0 max 14). The
> >> historgam still looks like the letter L , exteremly skewed to the
> >> right with  long tail.  Obviously, OLS is out, and I tried Poisson(
> >> glm nbinomial) but the distribution is still not distributed normally.
> >> The data are in order i.e no missing values etc etc. It is clean.For
> >> some reason, lobit would not run.
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
>
>
>
> --
> Muhammed
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```