Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: [problem with outliers in regression]


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: [problem with outliers in regression]
Date   Tue, 6 Aug 2002 18:15:31 +0100

Nazaria Solferino
 
> Hello! I'm a new Stata user and I'm not very good at
> using it yet. I hope some one can help with my
> problem. I've a large dataset, with some outliers, and
> I'd like to manage variables, that I have, only in a
> restricted range(without dropping observations) I've
> thought I could give a zero value to all veriables
> outside a certain range so I mean I should generate a
> newvar=oldvar then replace newvar=0 if outside the
> range. First, is this a sattistical correct procedure?

(Please use informative titles on Statalist messages.) 

No. Stata will take the new values of 0 just as literally 
as the old outlying values. 

One way to exclude outliers is by an -if- condition: 
e.g. 

regress y x1 x2 x3 if y < 10000 

Naturally, there are other approaches to your problem 
including  

1. a robust technique. I've found -qreg- very good. 

2. transformation. 

3. -glm- with a nonlinear link (e.g. log). 

> Second, if it's correct, how coukld I realize that
> with stat without find each interval with centile
> command for each variable, but realize a general
> program that I can apply to each variables?

That depends partly on what your project is. But 
my guess is that -qreg- or -glm- might offer 
a more general approach than what you propose 
here. 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index