Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: calculating mean without own observation


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: calculating mean without own observation
Date   Mon, 23 May 2011 17:16:26 +0100

Expanding this a bit:

There is more than one way to do this, which should be fine by everybody.

Phil Clayton outlined a solution looping over observations, which is
direct, but which will be _very_ slow for large datasets.

The FAQ below emphasises the -egen- route rather (too) heavily.

A direct route which will usually be fast goes something like this.

1. It is a good idea to segregate missings.

gen byte touse = !missing(value)

2. Then we get totals:

bysort touse category : gen total = sum(value) if touse
by touse category : replace total = total[_N]

3. Then we get counts of non-missings

by touse category : gen count = _N if touse

4. Now the finish is in sight

gen mean_others = (total - value) / (count - 1)

5. If we wanted to assign the mean of others to observations with
missing values, we could do this:

bysort category (touse) : replace mean_others = mean_others[_N]

An advantage of this approach is that it generalises easily:

6. Want to average an expression, not just a variable? Plug it in the
same place as the variable name.

7. Want to add -if- and/or -in- qualifiers? Build them in to the
-touse- definition

gen byte touse = !missing(value) & foo == 42 & bar < 1000

However, means are easy. Other statistics can be much more awkward.
The FAQ has more.

Nick

On Mon, May 23, 2011 at 4:02 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> This is an FAQ.
>
> FAQ     . . Creating variables recording prop. of the other members of a group
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>        4/05    How do I create variables summarizing for each
>                individual properties of the other members of a
>                group?
>                http://www.stata.com/support/faqs/data/members.html
>
> Once you know to look for references to (e.g.) "members.html" in the Statalist archive, you can find several related discussions.
>
> In this case, something like
>
> egen total = total(value), by(category)
> egen n = count(value), by(category)
>
> gen totalMINUSi = total - cond(missing(value), 0, value)
> gen meanMINUSi = totalMINUSi / (n - !missing(value))
>
> Incidentally, this cannot be done with a simple -if- precisely because values for other observations are involved in the calculation. But it can be approached directly.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Guo Xu
>
> How do I calculate a mean (or any other summary statistic) excluding
> the *current* observation?
>
> For example, I have following data:
>
> i      value      category
> 1     5            1
> 2     5            1
> 3     10          1
> 4      2           2
> 5      2           2
>
> I would like to calculate the mean for each category (egen value_mean
> = mean(value), by(category)), but exclude the i-th observation:
> For i=1, for example, the mean value by category 1 would be (5+10)/2.
> For i=3, it would be (5+5)/2. For i=4, it would be 2/1.
>
> I guess there must be some simple *if* condition for this
> manipulation, but I failed to find it - would be most grateful for
> help.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index