Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: calculating mean without own observation |

Date |
Mon, 23 May 2011 17:16:26 +0100 |

Expanding this a bit: There is more than one way to do this, which should be fine by everybody. Phil Clayton outlined a solution looping over observations, which is direct, but which will be _very_ slow for large datasets. The FAQ below emphasises the -egen- route rather (too) heavily. A direct route which will usually be fast goes something like this. 1. It is a good idea to segregate missings. gen byte touse = !missing(value) 2. Then we get totals: bysort touse category : gen total = sum(value) if touse by touse category : replace total = total[_N] 3. Then we get counts of non-missings by touse category : gen count = _N if touse 4. Now the finish is in sight gen mean_others = (total - value) / (count - 1) 5. If we wanted to assign the mean of others to observations with missing values, we could do this: bysort category (touse) : replace mean_others = mean_others[_N] An advantage of this approach is that it generalises easily: 6. Want to average an expression, not just a variable? Plug it in the same place as the variable name. 7. Want to add -if- and/or -in- qualifiers? Build them in to the -touse- definition gen byte touse = !missing(value) & foo == 42 & bar < 1000 However, means are easy. Other statistics can be much more awkward. The FAQ has more. Nick On Mon, May 23, 2011 at 4:02 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > This is an FAQ. > > FAQ . . Creating variables recording prop. of the other members of a group > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox > 4/05 How do I create variables summarizing for each > individual properties of the other members of a > group? > http://www.stata.com/support/faqs/data/members.html > > Once you know to look for references to (e.g.) "members.html" in the Statalist archive, you can find several related discussions. > > In this case, something like > > egen total = total(value), by(category) > egen n = count(value), by(category) > > gen totalMINUSi = total - cond(missing(value), 0, value) > gen meanMINUSi = totalMINUSi / (n - !missing(value)) > > Incidentally, this cannot be done with a simple -if- precisely because values for other observations are involved in the calculation. But it can be approached directly. > > Nick > n.j.cox@durham.ac.uk > > Guo Xu > > How do I calculate a mean (or any other summary statistic) excluding > the *current* observation? > > For example, I have following data: > > i value category > 1 5 1 > 2 5 1 > 3 10 1 > 4 2 2 > 5 2 2 > > I would like to calculate the mean for each category (egen value_mean > = mean(value), by(category)), but exclude the i-th observation: > For i=1, for example, the mean value by category 1 would be (5+10)/2. > For i=3, it would be (5+5)/2. For i=4, it would be 2/1. > > I guess there must be some simple *if* condition for this > manipulation, but I failed to find it - would be most grateful for > help. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: calculating mean without own observation***From:*Guo Xu <digitalepourpre@gmail.com>

**st: RE: calculating mean without own observation***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: Substitute for Notepad++ for editing Stata files in Linux.** - Next by Date:
**Re: st: Local Linear Regression for Regression Discontinuity Designs** - Previous by thread:
**st: RE: calculating mean without own observation** - Index(es):