Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: What multiple regression model for extreme distributions


From   David Greenberg <dg4@nyu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: What multiple regression model for extreme distributions
Date   Tue, 02 Feb 2010 20:09:19 -0500

Poisson and negative binomial regressions, along with their zero-inflated versions, are models for counts, not for levels of a continuous variable. That makes me think their use for this problem is dubious. Something on the other of a Tobit might be more appropriate. David Greenberg, Sociology Department, New York University

----- Original Message -----
From: muhammed abdul khalid <muhammed.abdulkhalid@gmail.com>
Date: Tuesday, February 2, 2010 3:08 pm
Subject: Re: st: What multiple regression model for extreme distributions
To: statalist@hsphsun2.harvard.edu


> Hi,
> Thank you for the replies.
> 
> The data is cross sectional, and saving is simply measured based on
> respondents answer on how much saving they have ( in dollars) with the
> minimum being zero. There is no negative saving. Yes, saving is my
> dependent variable.
> 
> I tried logit, zip, zinb, nbreg  but their std error varies greatly.
> Still unsure to what model should be used. My objective is to predict
> the contribution of education, gender, location and  ethnicity to
> saving of the household.
> 
> Thank you again for kind response.
> 
> Muhammed
> SciencesPo Paris.
> 
> 
> 
> 
> 
> 
> 2010/2/2 Austin Nichols <austinnichols@gmail.com>:
> > You have had a number of good suggestions already, but as Nick Cox
> > points out, the distribution of the dependent variable is not all that
> > relevant to what model you choose; it is the distribution of the
> > dependent variable conditional on explanatory variables that is
> > important.  Before you estimate a two-part "hurdle" or zero-inflated
> > model, I urge you to consider that the right set of explanatory
> > variables might well capture the reason for a large number of zero
> > outcomes (e.g. using -poisson- instead of -zip- etc.).  When it comes
> > to household saving (I think that is your dependent variable, not
> > independent), you also want to consider debt.  It may be the case that
> > households you are coding as zeros actually have negative saving
> > during the period under study.  Do you have panel data, or
> > cross-sectional data?  How is saving measured?
> >
> > On Tue, Feb 2, 2010 at 10:09 AM, <muhammed.abdulkhalid@gmail.com> wrote:
> >> I have a household income survey data ( 38,000 observations), and my
> >> problem is doing a multiple regression on saving ( independent var) 
> to
> >> ethnicity/strata/employment
> >> etc( dependent var).
> >>
> >> The problem is this : 70% of my observation for the value of saving 
> is
> >> zero. I had recode it to 1 and log them, but the distribution is still
> >> extremely skewed ( mean 0.78, std dev is 2.4  min 0 max 14). The
> >> historgam still looks like the letter L , exteremly skewed to the
> >> right with  long tail.  Obviously, OLS is out, and I tried Poisson(
> >> glm nbinomial) but the distribution is still not distributed normally.
> >> The data are in order i.e no missing values etc etc. It is clean.For
> >> some reason, lobit would not run.
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> 
> 
> 
> -- 
> Muhammed
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index