Maarten buis <maartenbuis@yahoo.co.uk>

statalist@hsphsun2.harvard.edu

Re: st: non-normal dependant variable

Fri, 7 Nov 2008 10:42:24 +0000 (GMT)

It appears you have spikes in your distribution. There exists no transformation that can take those away. Sometimes variables are just "spiky". For example, in an occupational status score there just are, in Western countries, spikes at the statusses for teachers, secretaries, nurses (with the added complexity that these are also female dominated occupations...). If that is the case I would not worry about it. However, given that the spikes occur at the values 0 and 1 suggests to me that there is something seriously wrong with your data. The only way to fix that is to find out where those zeros and ones come from. For example, this could occur when you tried to combine two datasets in which one dataset your dependent variable was measured on a 0,1 scale and in the other on a continuous scale, or when the process you are trying to model is way more complex than -regress- can handle. Either way I would start with figuring out where those values of zero and one came from before I started doing anything else. This is not something you can do inside Stata, you will have to go back to where the data came from and look at the documentation of that data. -- Maarten --- Dalhia <ggs_da@yahoo.com> wrote: > I have a dependant variable where a bulk of the data > varies between 0.5748 and -0.5984. Just plotting the > data between these values gives me a reasonably normal > curve. However, I also have about 100 observations > where the values on the dependant variable vary are > between abs(25) to abs(0.6). When these observations > are included, I get a very skewed distribution with > very high peaks on the 0, and 1, and then very few > observations on all other values. As a result, if I > run an OLS on the complete data, regression > diagnostics shows a very skewed error distribution, > and about a 100 outliers. > > For theoretical reasons, I would rather not convert > the depenedent variable into a 0/1, and use logistic > regressions. Is there any other way to deal with this > data? Transformations such as log transformations, > inverse transformations, square root transformations > don't work due to the zeros and negative values, and > also because they further pull the extreme values in, > hence increasing the peaks in the distribution. > > Is there a way of transforming the data so that it > stretches in the middle and pulls in values at the > extremes? Do any of you have suggestions about any > other ways of dealing with this kind of dependant > variable? It will be great to have techniques that > are relatively easy to implement in stata? ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room N515 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

