Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RE: st: What multiple regression model for extreme distributions


From   Robert A Yaffee <bob.yaffee@nyu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: RE: st: What multiple regression model for extreme distributions
Date   Tue, 02 Feb 2010 13:14:38 -0500

Muhammed,
You might want to try a Weibull regression, a Frechet or 
Gumbel regression.  If you could formulate a regression with
the extreme value distribution, and adjust the location, scale,
and shape parameters to fit your model that might be a good option.
Alternatively, you could run a Bayesian model with one of those
as a conjuage prior if you don't want to use a noninformative prior.
  Cheers,
     Bob


Robert A. Yaffee, Ph.D.
Research Professor
Silver School of Social Work
New York University

Biosketch: http://homepages.nyu.edu/~ray1/Biosketch2009.pdf

CV:  http://homepages.nyu.edu/~ray1/vita.pdf

----- Original Message -----
From: Maarten buis <maartenbuis@yahoo.co.uk>
Date: Tuesday, February 2, 2010 10:31 am
Subject: RE: st: What multiple regression model for extreme distributions
To: stata list <statalist@hsphsun2.harvard.edu>


> > I have a household income survey data ( 38,000 observations), and my
> > problem is doing a multiple regression on saving ( independent var) 
> to
> > ethnicity/strata/employmenttc( dependent var).
> > 
> > The problem is this : 70% of my observation for the value of saving 
> is
> > zero. I had recode it to 1 and log them, but the distribution is still
> > extremely skewed ( mean 0.78, std dev is 2.4  min 0 max 14). The
> > historgam still looks like the letter L , exteremly skewed to the
> > right with  long tail.  Obviously, OLS is out, and I tried Poisson(
> > glm nbinomial) but the distribution is still not distributed normally.
> > The data are in order i.e no missing values etc etc. It is clean.For
> > some reason, lobit would not run.
> 
> One option is you could fit a -zip- to the original saving variable, and
> use the -robust- option. That way you are modeling the mean saving as 
> a
> two-step proces, first a person decides whether or not to save, after 
> 
> that the persons that do save decide their amount. The influence of the
> explanatory variables on the explained variable occurs through the log
> link function, so you avoid the problem that you are no longer modeling
> the mean savings when you first take the log of savings and than model 
> 
> that transformed variable. Notice that by using the -robust- option you
> are only making the assumption that your model for the mean is correct,
> and that you are not making additional distributional assumptions.
> 
> Hope this helps,
> Maarten
> 
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
> 
> http://www.maartenbuis.nl
> --------------------------
> 
> 
>       
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index