[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
n j cox <n.j.cox@durham.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to calculate 75 percentile of other individuals on thesame |

Date |
Tue, 02 Oct 2007 19:47:46 +0100 |

Note that the general issue is also discussed at

How do I create variables summarizing for each individual properties of the other members of a group?

http://www.stata.com/support/faqs/data/members.html

Apart from sums and means -- when we can use short-cuts hased

on some rearrangement of, or implication of,

sum for everyone = sum for others + value for this individual

-- this kind of problem usually requires a loop. In the FAQ

just cited, it is shown that you can do by it looping

over within-group identifiers, rather than the whole

dataset.

However, the trade-offs are not very clear to me.

-_pctile- is built in, while any call to -egen- involves

an interpretative overhead. On the other hand, -_pctile-

can only emit one 75th percentile at a time, and -egen-

with -by()- can calculate several at a time by side-stepping

-_pctile-. The precise trade-offs would probably depend on the size of the dataset and the number of groups.

No doubt you could also speed it up using Mata or writing

more direct code.

Nick

n.j.cox@durham.ac.uk

Quang Nguyen asked

A simplified version of my data looks as follows:

ID Group X

1 a 5

2 a 7

3 a 9

4 a 8

5 b 3

6 b 4

7 b 9

..........................

I would like to generate a new variable whose value is the 75 percentile of

other individuals in the same group as the concerned individual. For

example, for the first individual (ID=1), this will be: 75 percentile

of {7, 9, 8}.

and Joseph Coveney replied

-findit percentile- turns up a lot to pore over. But among the results

is -egen <varname> = pctile(exp), p(#)-, which can take a -by- varlist.

Try something like:

bysort Group: egen p75 = pctile(X), p(75)

To finish: an observation is going to lie beneath, above or on a given

percentile for its group, so there's a smarter (more efficient) algorithm, but a brute-force approach is shown below.

clear *

set more off

set seed `=date("2007-09-29", "YMD")'

set obs 100

generate byte pid = _n

generate byte group = mod(_n, 10)

generate double response = uniform()

*

* Begin here

*

tempvar tmpvar0 tmpvar1

sort group

generate double p75 = .

generate double `tmpvar0' = .

quietly forvalues i = 1/`=_N' {

replace `tmpvar0' = response if _n != `i'

by group: egen double `tmpvar1' = pctile(`tmpvar0'), p(75)

replace p75 = `tmpvar1' in `i'

drop `tmpvar1'

replace `tmpvar0' = .

}

drop `tmpvar0'

list in 1/20, noobs sepby(group)

exit

Although my suggestion was centered around -egen-, which is very often a

convenience, you can usually do things more efficiently. For example, in this case, -_pctile if . . ., percentiles(75)- and then -replace p75 = r() in . . . - would avoid redundancy of -by . . .: egen . . . pctile()- where all of the other groups' results are calculated and discarded each time. There are other ways to polish the suggestion, too, and difference would be noticeable with large datasets and many groups.

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: How to calculate 75 percentile of other individuals on the same***From:*"Quang Nguyen" <quangn@gmail.com>

- Prev by Date:
**Re: st: RE: sub-dataset by variable numbers** - Next by Date:
**st: SUR Tobit** - Previous by thread:
**Re: st: RE: sub-dataset by variable numbers** - Next by thread:
**Re: st: How to calculate 75 percentile of other individuals on the same** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |