[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Quang Nguyen" <quangn@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to calculate 75 percentile of other individuals on the same |

Date |
Tue, 2 Oct 2007 14:29:50 -1000 |

Dear Nick, Thanks so much! I highly appreciate your kind support. Have A Wonderful Day! Many thanks! Quang On 10/2/07, n j cox <n.j.cox@durham.ac.uk> wrote: > Note that the general issue is also discussed at > > How do I create variables summarizing for each individual properties of > the other members of a group? > http://www.stata.com/support/faqs/data/members.html > > Apart from sums and means -- when we can use short-cuts hased > on some rearrangement of, or implication of, > > sum for everyone = sum for others + value for this individual > > -- this kind of problem usually requires a loop. In the FAQ > just cited, it is shown that you can do by it looping > over within-group identifiers, rather than the whole > dataset. > > However, the trade-offs are not very clear to me. > > -_pctile- is built in, while any call to -egen- involves > an interpretative overhead. On the other hand, -_pctile- > can only emit one 75th percentile at a time, and -egen- > with -by()- can calculate several at a time by side-stepping > -_pctile-. The precise trade-offs would probably depend on the size of > the dataset and the number of groups. > > No doubt you could also speed it up using Mata or writing > more direct code. > > Nick > n.j.cox@durham.ac.uk > > Quang Nguyen asked > > A simplified version of my data looks as follows: > > ID Group X > 1 a 5 > 2 a 7 > 3 a 9 > 4 a 8 > 5 b 3 > 6 b 4 > 7 b 9 > .......................... > > I would like to generate a new variable whose value is the 75 percentile of > other individuals in the same group as the concerned individual. For > example, for the first individual (ID=1), this will be: 75 percentile > of {7, 9, 8}. > > and Joseph Coveney replied > > -findit percentile- turns up a lot to pore over. But among the results > is -egen <varname> = pctile(exp), p(#)-, which can take a -by- varlist. > > Try something like: > bysort Group: egen p75 = pctile(X), p(75) > > To finish: an observation is going to lie beneath, above or on a given > percentile for its group, so there's a smarter (more efficient) > algorithm, but a brute-force approach is shown below. > > clear * > set more off > set seed `=date("2007-09-29", "YMD")' > set obs 100 > generate byte pid = _n > generate byte group = mod(_n, 10) > generate double response = uniform() > * > * Begin here > * > tempvar tmpvar0 tmpvar1 > sort group > generate double p75 = . > generate double `tmpvar0' = . > quietly forvalues i = 1/`=_N' { > replace `tmpvar0' = response if _n != `i' > by group: egen double `tmpvar1' = pctile(`tmpvar0'), p(75) > replace p75 = `tmpvar1' in `i' > drop `tmpvar1' > replace `tmpvar0' = . > } > drop `tmpvar0' > list in 1/20, noobs sepby(group) > exit > > Although my suggestion was centered around -egen-, which is very often a > convenience, you can usually do things more efficiently. For example, > in this case, -_pctile if . . ., percentiles(75)- and then -replace p75 > = r() in . . . - would avoid redundancy of -by . . .: egen . . . > pctile()- where all of the other groups' results are calculated and > discarded each time. There are other ways to polish the suggestion, too, > and difference would be noticeable with large datasets and many groups. > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- "My father gave me the greatest gift anyone could give another person, he believed in me." - Jim Valvano * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

