Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: calculating mean without own observation

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: RE: calculating mean without own observation
Date	Mon, 23 May 2011 17:16:26 +0100

Expanding this a bit:

There is more than one way to do this, which should be fine by everybody.

Phil Clayton outlined a solution looping over observations, which is
direct, but which will be _very_ slow for large datasets.

The FAQ below emphasises the -egen- route rather (too) heavily.

A direct route which will usually be fast goes something like this.

1. It is a good idea to segregate missings.

gen byte touse = !missing(value)

2. Then we get totals:

bysort touse category : gen total = sum(value) if touse
by touse category : replace total = total[_N]

3. Then we get counts of non-missings

by touse category : gen count = _N if touse

4. Now the finish is in sight

gen mean_others = (total - value) / (count - 1)

5. If we wanted to assign the mean of others to observations with
missing values, we could do this:

bysort category (touse) : replace mean_others = mean_others[_N]

An advantage of this approach is that it generalises easily:

6. Want to average an expression, not just a variable? Plug it in the
same place as the variable name.

7. Want to add -if- and/or -in- qualifiers? Build them in to the
-touse- definition

gen byte touse = !missing(value) & foo == 42 & bar < 1000

However, means are easy. Other statistics can be much more awkward.
The FAQ has more.

Nick

On Mon, May 23, 2011 at 4:02 PM, Nick Cox <[email protected]> wrote:
> This is an FAQ.
>
> FAQ     . . Creating variables recording prop. of the other members of a group
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>        4/05    How do I create variables summarizing for each
>                individual properties of the other members of a
>                group?
>                http://www.stata.com/support/faqs/data/members.html
>
> Once you know to look for references to (e.g.) "members.html" in the Statalist archive, you can find several related discussions.
>
> In this case, something like
>
> egen total = total(value), by(category)
> egen n = count(value), by(category)
>
> gen totalMINUSi = total - cond(missing(value), 0, value)
> gen meanMINUSi = totalMINUSi / (n - !missing(value))
>
> Incidentally, this cannot be done with a simple -if- precisely because values for other observations are involved in the calculation. But it can be approached directly.
>
> Nick
> [email protected]
>
> Guo Xu
>
> How do I calculate a mean (or any other summary statistic) excluding
> the *current* observation?
>
> For example, I have following data:
>
> i      value      category
> 1     5            1
> 2     5            1
> 3     10          1
> 4      2           2
> 5      2           2
>
> I would like to calculate the mean for each category (egen value_mean
> = mean(value), by(category)), but exclude the i-th observation:
> For i=1, for example, the mean value by category 1 would be (5+10)/2.
> For i=3, it would be (5+5)/2. For i=4, it would be 2/1.
>
> I guess there must be some simple *if* condition for this
> manipulation, but I failed to find it - would be most grateful for
> help.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: calculating mean without own observation
  - From: Guo Xu <[email protected]>
- st: RE: calculating mean without own observation
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Substitute for Notepad++ for editing Stata files in Linux.
Next by Date: Re: st: Local Linear Regression for Regression Discontinuity Designs
Previous by thread: st: RE: calculating mean without own observation
Index(es):
- Date
- Thread