Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: _N in by-groups

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: _N in by-groups
Date	Fri, 19 Aug 2011 16:58:34 +0100

I agree with Maarten that this is generally a bad idea.

Somewhere in the documentation there is a warning about trying to use_n with -egen-. For example, -egen- often works temporarily with adifferent sort order.


One way to get what you want is a two-step

egen max = max(var1), by(group)

egen mean = mean(var1/(var1 < max)), by(group)

The -if- way is often not what you want here, as discussed recently onthe list.


Nick

On 19 Aug 2011, at 15:40, [email protected] wrote:

I too am confused regarding when _N is or isn't influenced
by the –by :- prefix.

I would like to remove a single outlier from each group within
the following data set...

input group var1

1      4
1      5
1     81
2      2
2      3
2      3
2     72

end

I would then like to calculate the mean for each group (with theoutliers

gone).

I assumed that the following code would do the trick…


by group (var1), sort: egen average = mean(var1) if var1 != var1[_N]

When the mean was calculated – it did so following the –by :- prefix

(i.e. _N = 3 for group 1). But following the –if- option, _N was
calculated from the whole data set (i.e. _N = 7).

I got around this problem by generating/sorting a byte tag, however,I still

don’t understand WHY and HOW Stata does this.

Could I have dealt with the above using a single line of code?

Cheers,

Mike (beginner Stata 8)




* So _N, as it were, never sees the -by:- and is not influenced
by it.

** If a Stata command has by-groups, it seems like _N is interpreted
sometimes as the number of observations in the by-group and sometimes
as the number of observations in the data set.

*** If you use the -by :- prefix it is always defined as the number of
observations within each by-group. Stata would be a pretty lousy
program if such a scalar randomly changed meaning...


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: _N in by-groups
  - From: Phil Schumm <[email protected]>

References:
- st: _N in by-groups
  - From: Matthew White <[email protected]>
- Re: st: _N in by-groups
  - From: Maarten Buis <[email protected]>
- Re: st: _N in by-groups
  - From: Matthew White <[email protected]>
- Re: st: _N in by-groups
  - From: Nick Cox <[email protected]>
- Re: st: _N in by-groups
  - From: [email protected]

Prev by Date: st: variables in mca
Next by Date: Re: st: adjusted twoway table
Previous by thread: Re: st: _N in by-groups
Next by thread: Re: st: _N in by-groups
Index(es):
- Date
- Thread