[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: What multiple regression model for extreme distributions |

Date |
Tue, 2 Feb 2010 15:23:16 -0000 |

This kind of problem is often raised on this list. It is not easy, but some commonly made remarks include 1. No transformation will undo a spike in the data. A spike maps to a spike, whatever you do with it. 2. The assumptions made in multiple regression do not include the response being normally distributed, as any decent text makes clear. The assumption is at most that errors are so distributed, and even then it's about the least important assumption made. 3. log(y + 1) is an ad hoc transformation that many dislike on various grounds. 4. -glm- with log link does not depend on the response being positive and circumvents #3. Neither -glm- nor its relatives purport to be a transformation procedure that leaves anything normally distributed that was not so previously. 5. Much depends on how you think of the zeros, whether as a qualitatively different group, or as in essence an extreme subset with extremely low savings. Some people like two-part models in terms of who or who does not save and then how much savers save. This is a substantive or scientific matter requiring the researcher to think, rather than to apply pre-existing formulae or programs. Emphatically not all that could be said.... Nick n.j.cox@durham.ac.uk muhammed abdul khalid I have a household income survey data ( 38,000 observations), and my problem is doing a multiple regression on saving ( independent var) to ethnicity/strata/employment etc( dependent var). The problem is this : 70% of my observation for the value of saving is zero. I had recode it to 1 and log them, but the distribution is still extremely skewed ( mean 0.78, std dev is 2.4 min 0 max 14). The historgam still looks like the letter L , exteremly skewed to the right with long tail. Obviously, OLS is out, and I tried Poisson( glm nbinomial) but the distribution is still not distributed normally. The data are in order i.e no missing values etc etc. It is clean.For some reason, lobit would not run. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: What multiple regression model for extreme distributions***From:*muhammed abdul khalid <muhammed.abdulkhalid@gmail.com>

- Prev by Date:
**AW: st: What multiple regression model for extreme distributions** - Next by Date:
**Re: st: How to force stata to use different line pattern** - Previous by thread:
**Re: st: What multiple regression model for extreme distributions** - Next by thread:
**Re: st: What multiple regression model for extreme distributions** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |