# st: St: How do fit a cox PH model for categorical outcome variable with 3 levels in the same model

 From "Sharon Johnatty" To Subject st: St: How do fit a cox PH model for categorical outcome variable with 3 levels in the same model Date Wed, 3 Oct 2007 11:22:37 +1000

```Hello, I've just joined your group and I'm working with survival time
data in STATA v.9

I've stset my data and I'm able to run cox models, no problem.  The
question is how do I obtain HRs for an outcome variable with three
levels, e.g. var_1_2  coded 0, 1, 2 in the same model, and obtain
separate estimates for 0 vs. 1 and 0 vs. 2.  Right now I'm able to do
this using  stcoxkm, by(var_1_2), then type stcox.  But Stata does not
allow me to run an adjusted model in this way. Is there another way to
fit a single model for a categorical variable with more than 2 levels
for both univariate and multivariate cox regression?

Thanks
Sharon

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Quang Nguyen
Sent: Wednesday, 3 October 2007 10:30 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: How to calculate 75 percentile of other individuals on
the same

Dear Nick,

Thanks so much! I highly appreciate your kind support.

Have A Wonderful Day!
Many thanks!

Quang

On 10/2/07, n j cox <n.j.cox@durham.ac.uk> wrote:
> Note that the general issue is also discussed at
>
> How do I create variables summarizing for each individual properties
> of the other members of a group?
> http://www.stata.com/support/faqs/data/members.html
>
> Apart from sums and means -- when we can use short-cuts hased on some
> rearrangement of, or implication of,
>
> sum for everyone = sum for others + value for this individual
>
> -- this kind of problem usually requires a loop. In the FAQ just
> cited, it is shown that you can do by it looping over within-group
> identifiers, rather than the whole dataset.
>
> However, the trade-offs are not very clear to me.
>
> -_pctile- is built in, while any call to -egen- involves
> an interpretative overhead. On the other hand, -_pctile-
> can only emit one 75th percentile at a time, and -egen-
> with -by()- can calculate several at a time by side-stepping
> -_pctile-. The precise trade-offs would probably depend on the size of

> the dataset and the number of groups.
>
> No doubt you could also speed it up using Mata or writing more direct
> code.
>
> Nick
> n.j.cox@durham.ac.uk
>
>
> A simplified version of my data looks as follows:
>
> ID      Group     X
> 1       a             5
> 2       a             7
> 3       a             9
> 4       a             8
> 5       b             3
> 6       b             4
> 7       b             9
>  ..........................
>
> I would like to generate a new variable whose value is the 75
> percentile of other individuals in the same group as the concerned
> individual. For example, for the first individual (ID=1), this will
> be: 75 percentile of {7, 9, 8}.
>
> and Joseph Coveney replied
>
> -findit percentile- turns up a lot to pore over.  But among the
> results is -egen <varname> = pctile(exp), p(#)-, which can take a -by-

> varlist.
>
> Try something like:
> bysort Group: egen p75 = pctile(X), p(75)
>
> To finish:  an observation is going to lie beneath, above or on a
> given percentile for its group, so there's a smarter (more efficient)
> algorithm, but a brute-force approach is shown below.
>
> clear *
> set more off
> set seed `=date("2007-09-29", "YMD")'
> set obs 100
> generate byte pid = _n
> generate byte group = mod(_n, 10)
> generate double response = uniform()
> *
> * Begin here
> *
> tempvar tmpvar0 tmpvar1
> sort group
> generate double p75 = .
> generate double `tmpvar0' = .
> quietly forvalues i = 1/`=_N' {
>     replace `tmpvar0' = response if _n != `i'
>     by group: egen double `tmpvar1' = pctile(`tmpvar0'), p(75)
>     replace p75 = `tmpvar1' in `i'
>     drop `tmpvar1'
>     replace `tmpvar0' = .
> }
> drop `tmpvar0'
> list in 1/20, noobs sepby(group)
> exit
>
> Although my suggestion was centered around -egen-, which is very often

> a convenience, you can usually do things more efficiently.  For
> example, in this case, -_pctile if . . ., percentiles(75)- and then
> -replace p75 = r() in . . . - would avoid redundancy of -by . . .:
> egen . . .
> pctile()- where all of the other groups' results are calculated and
> discarded each time. There are other ways to polish the suggestion,
too,
> and difference would be noticeable with large datasets and many
groups.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
"My father gave me the greatest gift anyone could give another person,
he believed in me." - Jim Valvano
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```