Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: non-normal dependant variable


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: non-normal dependant variable
Date   Fri, 7 Nov 2008 10:42:24 +0000 (GMT)

It appears you have spikes in your distribution. There exists no
transformation that can take those away.

Sometimes variables are just "spiky". For example, in an occupational
status score there just are, in Western countries, spikes at the
statusses for teachers, secretaries, nurses (with the added complexity
that these are also female dominated occupations...). If that is the
case I would not worry about it. 

However, given that the spikes occur at the values 0 and 1 suggests to
me that there is something seriously wrong with your data. The only way
to fix that is to find out where those zeros and ones come from. For
example, this could occur when you tried to combine two datasets in
which one dataset your dependent variable was measured on a 0,1 scale
and in the other on a continuous scale, or when the process you are
trying to model is way more complex than -regress- can handle. Either
way I would start with figuring out where those values of zero and one
came from before I started doing anything else. This is not something
you can do inside Stata, you will have to go back to where the data
came from and look at the documentation of that data.

-- Maarten

--- Dalhia <ggs_da@yahoo.com> wrote:
> I have a dependant variable where a bulk of the data
> varies between 0.5748 and -0.5984.  Just plotting the
> data between these values gives me a reasonably normal
> curve.  However, I also have about 100 observations
> where the values on the dependant variable vary are
> between abs(25) to abs(0.6).  When these observations
> are included, I get a very skewed distribution with
> very high peaks on the 0, and 1, and then very few
> observations on all other values. As a result, if I
> run an OLS on the complete data, regression
> diagnostics shows a very skewed error distribution,
> and about a 100 outliers.  
> 
> For theoretical reasons, I would rather not convert
> the depenedent variable into a 0/1, and use logistic
> regressions.  Is there any other way to deal with this
> data?  Transformations such as log transformations,
> inverse transformations, square root transformations
> don't work due to the zeros and negative values, and
> also because they further pull the extreme values in,
> hence increasing the peaks in the distribution.  
> 
> Is there a way of transforming the data so that it
> stretches in the middle and pulls in values at the
> extremes? Do any of you have suggestions about any
> other ways of dealing with this kind of dependant
> variable?  It will be great to have techniques that
> are relatively easy to implement in stata?  


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index