Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: _N in by-groups
From
Maarten Buis <[email protected]>
To
[email protected]
Subject
Re: st: _N in by-groups
Date
Fri, 19 Aug 2011 17:14:21 +0200
On Fri, Aug 19, 2011 at 4:40 PM, <[email protected]> wrote:
> I too am confused regarding when _N is or isn't influenced
> by the –by :- prefix.
>
> I would like to remove a single outlier from each group within
> the following data set...
>
> input group var1
>
> 1 4
> 1 5
> 1 81
> 2 2
> 2 3
> 2 3
> 2 72
>
> end
>
> I would then like to calculate the mean for each group (with the outliers
> gone).
>
> I assumed that the following code would do the trick…
>
>
> by group (var1), sort: egen average = mean(var1) if var1 != var1[_N]
Your if condition is wrong: _N is the number of observations within
your group and var1[_N] gives you the _Nth value of in your entire
dataset. So in your second group _N = 4, so var1[_N] refers to the
fourth value of var1 in your entire dataset, i.e. 2 instead of 72,
which is obviously not what you want. Instead your if condition should
have been if _n != _N. This is still not very stable as it will be
susceptible to missing values. Better is:
gen mis = missing(var1)
by group miss (var1) : egen average = mean(var1) if _n != _N & mis == 0
However there seems to be a bug in the _gmean (the program that -egen-
calls to compute the means) in the way it handles such selection
criteria. So you'll need to do a bit more work:
*---------------- begin example ---------------
clear
input group var1
1 4
1 5
1 81
2 2
2 3
2 3
2 72
end
tempvar touse mis
quietly {
gen byte `mis' = missing(var1)
bys group `mis': gen byte `touse'=1 if _n != _N & `mis' == 0
sort `touse' group
by `touse' group: gen double average = sum(var1)/sum((var1)<.) if `touse'==1
by `touse' group: replace average = average[_N]
}
*----------------------- end example -------------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )
Notice however that from a scientific viewpoint such automatic
procedures of dropping the most informative observations in your data
is obviously completely and utterly wrong, see e.g.:
<http://www.stata.com/statalist/archive/2011-08/msg00398.html>
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/