Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: using -drop if- with weights


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: using -drop if- with weights
Date   Mon, 6 Sep 2010 09:15:55 +0000 (GMT)

---  Luis Armando Galvis writes: 
> I have a question I am stuck with. I need to drop
> observations that are beyond 3 standard errors from the mean
> of one of the variables. The problem is that using -drop if-
> will eliminate observations without taking into account the
> weights and will eliminate more observations than needed. I
> cannot expand the dataset to 8 million records because of
> memory issues. My question is if there is a way to do this
> procedure in a more manageable way. 

The command -drop- doesn't know weights, or allows for weights.
It doesn't know the mean or standard deviation either, so the
problem is not with -drop- but with what you typed before. 
Since you did not tell us what you typed before, it is hard for
us to comment. Also you did not tell us why you think that your
command drops too many observations. This can be crucial 
information, as the rules of thumb about how many observations 
should be dropped with such a rule are often based on the normal
distribution, but if your variable is severly skewed or has a 
spike than all bets are off when it comes to predicting how many 
observations will be dropped with such a rule.

On a more fundamental note: such automatic deletion of observation
is almost always very very very wrong. Almost always it is the
exceptions that contain the most information, so we do not want 
to throw them away. Think about it from a policy point of view, it 
is usually the exceptions that we want to attain or prevent: We
want the population to live long and healthy and be richt, and want 
to prevent early deaths, illness, and poverty. It is the extremes
that contain information on these events, not the "normal" 
observations.

However, technically this is how you can do it:

sum var [fw=w]
drop if var < r(mean) - 3*r(sd) | var > r(mean) + 3*r(sd)

(assuming that your variables is called var and your weight
is called w)

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index