Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: a question on averaging in Stata

From   Nick Cox <>
Subject   Re: st: RE: a question on averaging in Stata
Date   Wed, 8 Feb 2012 14:55:05 +0000

The device used in the FAQ to calculate maximum values clearly isn't
good for work for medians. You are trying to calculate the median over

original value if I want this
zero if I don't want this

and those zeros may affect the result. With maxima over ages the zeros
won't (usually) do that.

This is quick and dirty but illustrates a more general technique

generate median = .
by year, sort: gen pid = _n
summarize pid, meanonly
quietly forvalues i = 1/`r(max)' {
       egen work = median(idio / (pid != `i')), by(year)
       replace median = work if pid == `i'
       drop work

What is crucial here is that -median()- takes an expression, which can
be more complicated than a variable name, and that

idio / (pid != `i')

is -idio- when -pid- is not the current identifier and missing
otherwise. So, -egen- will ignore the missings.

For more discussion see

Nicholas J. Cox. 2011. Speaking Stata: Compared with .... Stata
Journal 11(2): 305-314.

Abstract.  Many problems in data management center on relating values
to values in other observations, either within a dataset as a whole or
within groups such as panels. This column reviews some basic Stata
techniques helpful for such tasks, including the use of subscripts,
summarize, by:, sum(), cond(), and egen. Several techniques exploit
the fact that logical expressions yield 1 when true and 0 when false.
Dividing by zero to yield missings is revealed as a surprisingly
valuable device.

Advice on "Thanks in advance" is included in the FAQ.


On Wed, Feb 8, 2012 at 2:30 PM, <> wrote:

> thanks a lot for your feedback. The information was very useful. I have one additional question that relates to estimating a group median excluding observation i. I have looked at the article that you have referred to, but I got stuck with writing the code for the case of medians.
> Again I have a panel data with items i observed over several years t for variable x. I need to estimate the median of this variable for each year. However I have to estimate a specific median: for each item i I have to estimate
> the median value of x but excluding the observation for item i itself: i.e. the median over the other objects (if I could label them
> -i).
> I found this technically more challenging compared to the estimation of means.  I have started with the following code - I used as example one of the codes that you have shared with us in your article. But I cannot find a way to isolate item i from the median calculation.
> Could you please help me with that? I would like to thank you in advance.

>  generate maxvar = .
>   by year, sort: gen pid = _n
>  summarize pid
>  . quietly forvalues i = 1/`r(max)' {
>  .       generate include = 1 if pid != `i'
>  .       egen work = median(idio * include), by(year)
>  .       replace maxvar = work if pid == `i'
>  .       drop include work
>  . }

Von: Nick Cox <>

> This is a FAQ.
> FAQ     . . Creating variables recording prop. of the other members of a group
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         4/05    How do I create variables summarizing for each
>                 individual properties of the other members of a
>                 group?
> but the question also yields easily to Stata logic. The starting point is the idea that the total for everybody else is just the total minus my value.
> The average of every other item is
> (sum of others) / (count of others)
> which is in the simplest case
> (sum of all - this value) / (count of all - 1)
> -- although careful code would need to take account of the possibility that each value is missing.
> That is then
> egen sum = total(x), by(group)
> egen count = count(x), by(group)
> and then the average is
> gen mean = (sum - cond(missing(x), 0, x) / (count - !missing(x))
> If any value is missing, then we need to subtract 0 (not missing!) from the total to get the total of others.
> If any value is missing, then we need to subtract 0 (not 1!) from the count to get the count of others.

> I have a panel data with items i observed over several years t for variable x.
> I have to estimate a specific average: for each item i I have to take the mean value of x  excluding the observations for the item i itself;i.e. the average over the other objects (if I could label them -i).
> Is this possible in Stata?

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index