Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: a question on averaging in Stata |

Date |
Wed, 8 Feb 2012 14:55:05 +0000 |

The device used in the FAQ to calculate maximum values clearly isn't good for work for medians. You are trying to calculate the median over original value if I want this zero if I don't want this and those zeros may affect the result. With maxima over ages the zeros won't (usually) do that. This is quick and dirty but illustrates a more general technique generate median = . by year, sort: gen pid = _n summarize pid, meanonly quietly forvalues i = 1/`r(max)' { egen work = median(idio / (pid != `i')), by(year) replace median = work if pid == `i' drop work } What is crucial here is that -median()- takes an expression, which can be more complicated than a variable name, and that idio / (pid != `i') is -idio- when -pid- is not the current identifier and missing otherwise. So, -egen- will ignore the missings. For more discussion see Nicholas J. Cox. 2011. Speaking Stata: Compared with .... Stata Journal 11(2): 305-314. Abstract. Many problems in data management center on relating values to values in other observations, either within a dataset as a whole or within groups such as panels. This column reviews some basic Stata techniques helpful for such tasks, including the use of subscripts, summarize, by:, sum(), cond(), and egen. Several techniques exploit the fact that logical expressions yield 1 when true and 0 when false. Dividing by zero to yield missings is revealed as a surprisingly valuable device. Advice on "Thanks in advance" is included in the FAQ. Nick On Wed, Feb 8, 2012 at 2:30 PM, rado645-bg@yahoo.de <rado645-bg@yahoo.de> wrote: > thanks a lot for your feedback. The information was very useful. I have one additional question that relates to estimating a group median excluding observation i. I have looked at the article that you have referred to, but I got stuck with writing the code for the case of medians. > > > Again I have a panel data with items i observed over several years t for variable x. I need to estimate the median of this variable for each year. However I have to estimate a specific median: for each item i I have to estimate > the median value of x but excluding the observation for item i itself: i.e. the median over the other objects (if I could label them > -i). > I found this technically more challenging compared to the estimation of means. I have started with the following code - I used as example one of the codes that you have shared with us in your article. But I cannot find a way to isolate item i from the median calculation. > > Could you please help me with that? I would like to thank you in advance. > generate maxvar = . > by year, sort: gen pid = _n > summarize pid > . quietly forvalues i = 1/`r(max)' { > . generate include = 1 if pid != `i' > . egen work = median(idio * include), by(year) > . replace maxvar = work if pid == `i' > . drop include work > . } Von: Nick Cox <n.j.cox@durham.ac.uk> > This is a FAQ. > > FAQ . . Creating variables recording prop. of the other members of a group > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox > 4/05 How do I create variables summarizing for each > individual properties of the other members of a > group? > http://www.stata.com/support/faqs/data/members.html > > but the question also yields easily to Stata logic. The starting point is the idea that the total for everybody else is just the total minus my value. > > The average of every other item is > > (sum of others) / (count of others) > > which is in the simplest case > > (sum of all - this value) / (count of all - 1) > > -- although careful code would need to take account of the possibility that each value is missing. > > That is then > > egen sum = total(x), by(group) > egen count = count(x), by(group) > > and then the average is > > gen mean = (sum - cond(missing(x), 0, x) / (count - !missing(x)) > > If any value is missing, then we need to subtract 0 (not missing!) from the total to get the total of others. > > If any value is missing, then we need to subtract 0 (not 1!) from the count to get the count of others. rado645-bg@yahoo.de > I have a panel data with items i observed over several years t for variable x. > > I have to estimate a specific average: for each item i I have to take the mean value of x excluding the observations for the item i itself;i.e. the average over the other objects (if I could label them -i). > > Is this possible in Stata? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: RE: a question on averaging in Stata***From:*"rado645-bg@yahoo.de" <rado645-bg@yahoo.de>

- Prev by Date:
**st: 2-day Dynamic Factor Models/Time Series course with Stata - 2-3 April 2012** - Next by Date:
**st: Tukey's HSD test from summary statistics** - Previous by thread:
**Re: st: RE: a question on averaging in Stata** - Next by thread:
**st: Discrete-time duration models with sample selection** - Index(es):