Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: a question on averaging in Stata

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: RE: a question on averaging in Stata
Date	Wed, 8 Feb 2012 14:55:05 +0000

The device used in the FAQ to calculate maximum values clearly isn't
good for work for medians. You are trying to calculate the median over

original value if I want this
zero if I don't want this

and those zeros may affect the result. With maxima over ages the zeros
won't (usually) do that.

This is quick and dirty but illustrates a more general technique

generate median = .
by year, sort: gen pid = _n
summarize pid, meanonly
quietly forvalues i = 1/`r(max)' {
       egen work = median(idio / (pid != `i')), by(year)
       replace median = work if pid == `i'
       drop work
}

What is crucial here is that -median()- takes an expression, which can
be more complicated than a variable name, and that

idio / (pid != `i')

is -idio- when -pid- is not the current identifier and missing
otherwise. So, -egen- will ignore the missings.

For more discussion see

Nicholas J. Cox. 2011. Speaking Stata: Compared with .... Stata
Journal 11(2): 305-314.

Abstract.  Many problems in data management center on relating values
to values in other observations, either within a dataset as a whole or
within groups such as panels. This column reviews some basic Stata
techniques helpful for such tasks, including the use of subscripts,
summarize, by:, sum(), cond(), and egen. Several techniques exploit
the fact that logical expressions yield 1 when true and 0 when false.
Dividing by zero to yield missings is revealed as a surprisingly
valuable device.

Advice on "Thanks in advance" is included in the FAQ.

Nick

On Wed, Feb 8, 2012 at 2:30 PM, [email protected] <[email protected]> wrote:

> thanks a lot for your feedback. The information was very useful. I have one additional question that relates to estimating a group median excluding observation i. I have looked at the article that you have referred to, but I got stuck with writing the code for the case of medians.
>
>
> Again I have a panel data with items i observed over several years t for variable x. I need to estimate the median of this variable for each year. However I have to estimate a specific median: for each item i I have to estimate
> the median value of x but excluding the observation for item i itself: i.e. the median over the other objects (if I could label them
> -i).
> I found this technically more challenging compared to the estimation of means.  I have started with the following code - I used as example one of the codes that you have shared with us in your article. But I cannot find a way to isolate item i from the median calculation.
>
> Could you please help me with that? I would like to thank you in advance.

>  generate maxvar = .
>   by year, sort: gen pid = _n
>  summarize pid
>  . quietly forvalues i = 1/`r(max)' {
>  .       generate include = 1 if pid != `i'
>  .       egen work = median(idio * include), by(year)
>  .       replace maxvar = work if pid == `i'
>  .       drop include work
>  . }

Von: Nick Cox <[email protected]>

> This is a FAQ.
>
> FAQ     . . Creating variables recording prop. of the other members of a group
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         4/05    How do I create variables summarizing for each
>                 individual properties of the other members of a
>                 group?
>                 http://www.stata.com/support/faqs/data/members.html
>
> but the question also yields easily to Stata logic. The starting point is the idea that the total for everybody else is just the total minus my value.
>
> The average of every other item is
>
> (sum of others) / (count of others)
>
> which is in the simplest case
>
> (sum of all - this value) / (count of all - 1)
>
> -- although careful code would need to take account of the possibility that each value is missing.
>
> That is then
>
> egen sum = total(x), by(group)
> egen count = count(x), by(group)
>
> and then the average is
>
> gen mean = (sum - cond(missing(x), 0, x) / (count - !missing(x))
>
> If any value is missing, then we need to subtract 0 (not missing!) from the total to get the total of others.
>
> If any value is missing, then we need to subtract 0 (not 1!) from the count to get the count of others.


[email protected]

> I have a panel data with items i observed over several years t for variable x.
>
> I have to estimate a specific average: for each item i I have to take the mean value of x  excluding the observations for the item i itself;i.e. the average over the other objects (if I could label them -i).
>
> Is this possible in Stata?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: RE: a question on averaging in Stata
  - From: "[email protected]" <[email protected]>

Prev by Date: st: 2-day Dynamic Factor Models/Time Series course with Stata - 2-3 April 2012
Next by Date: st: Tukey's HSD test from summary statistics
Previous by thread: Re: st: RE: a question on averaging in Stata
Next by thread: st: Discrete-time duration models with sample selection
Index(es):
- Date
- Thread