Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: What multiple regression model for extreme distributions


From   DE SOUZA Eric <eric.de_souza@coleurope.eu>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: What multiple regression model for extreme distributions
Date   Tue, 2 Feb 2010 21:22:37 +0100

Your's is a case of what is known as censored data, even though the name is inappropriate. Wooldridge calls is "corner solution models" (See either this introductory or advanced textbook).

Have you tried -tobit-, which is the appropriate estimation method in this case? 


Eric de Souza
College of Europe
BE-8000 Brugge (Bruges)
Belgium

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of muhammed abdul khalid
Sent: 02 February 2010 21:08
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: What multiple regression model for extreme distributions

Hi,
Thank you for the replies.

The data is cross sectional, and saving is simply measured based on respondents answer on how much saving they have ( in dollars) with the minimum being zero. There is no negative saving. Yes, saving is my dependent variable.

I tried logit, zip, zinb, nbreg  but their std error varies greatly.
Still unsure to what model should be used. My objective is to predict the contribution of education, gender, location and  ethnicity to saving of the household.

Thank you again for kind response.

Muhammed
SciencesPo Paris.






2010/2/2 Austin Nichols <austinnichols@gmail.com>:
> You have had a number of good suggestions already, but as Nick Cox 
> points out, the distribution of the dependent variable is not all that 
> relevant to what model you choose; it is the distribution of the 
> dependent variable conditional on explanatory variables that is 
> important.  Before you estimate a two-part "hurdle" or zero-inflated 
> model, I urge you to consider that the right set of explanatory 
> variables might well capture the reason for a large number of zero 
> outcomes (e.g. using -poisson- instead of -zip- etc.).  When it comes 
> to household saving (I think that is your dependent variable, not 
> independent), you also want to consider debt.  It may be the case that 
> households you are coding as zeros actually have negative saving 
> during the period under study.  Do you have panel data, or 
> cross-sectional data?  How is saving measured?
>
> On Tue, Feb 2, 2010 at 10:09 AM, <muhammed.abdulkhalid@gmail.com> wrote:
>> I have a household income survey data ( 38,000 observations), and my 
>> problem is doing a multiple regression on saving ( independent var) 
>> to ethnicity/strata/employment etc( dependent var).
>>
>> The problem is this : 70% of my observation for the value of saving 
>> is zero. I had recode it to 1 and log them, but the distribution is 
>> still extremely skewed ( mean 0.78, std dev is 2.4  min 0 max 14). 
>> The historgam still looks like the letter L , exteremly skewed to the 
>> right with  long tail.  Obviously, OLS is out, and I tried Poisson( 
>> glm nbinomial) but the distribution is still not distributed normally.
>> The data are in order i.e no missing values etc etc. It is clean.For 
>> some reason, lobit would not run.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



--
Muhammed

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index