Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Large standard error, Cox PH

 From Steve Samuels <[email protected]> To [email protected] Subject Re: st: Large standard error, Cox PH Date Sun, 29 Jul 2012 07:25:18 -0400

```On Jul 29, 2012, at 4:08 AM, Lee Savage wrote:
>
> Thanks for you help Steve. I don't really know my way around R very well but
> now might be a good time to learn. Is there any way to fit a Cox model using
> lasso in Stata?
>
Not that I know of. There are implementations for linear regression( -lars- from SSC)
and for a related technique, penalized logistic regression
(http://www.homepages.ucl.ac.uk/~ucakgam/stata.html),

Stepwise is a rightly condemned method for selecting variables, but
bootstrapping has been proposed as way of rehabilitating stepwise.
Maarten Buis showed how to bootstrap stepwise Cox models in
http://www.stata.com/statalist/archive/2011-05/msg01427.html .

The publication I referred to, and others, can be found on Rob Tibshirani's
lasso page: http://www-stat.stanford.edu/~tibs/lasso.html

Steve
[email protected]

----- Original Message -----
From: Steve Samuels <[email protected]>
To: [email protected]
Cc:
Sent: Sunday, 29 July 2012, 3:46
Subject: Re: st: Large standard error, Cox PH

"but I would say that the ratio of the number of
failures to the number of predictors should be no more than 5:1"

That should be "no less than 5:1"

Steve

A scatter plot of "Minority" against your time variable is likely to show
very little overlap of minority/non-minority countries. If so, the effect of the
"Minority" variable is not accurately described by a proportional hazards model.
The ordinary solution would be to designate "Minority" as a stratum variable
in the Cox model.

But you have a far more serious problem: overfitting (Bayak, 2004). Rules of
thumb are not easy to come by, but I would say that the ratio of the number of
failures to the number of predictors should be no more than 5:1.  At 19:10, You
are far over that limit. Thus you must throw the entire model out and start from
scratch. You simply cannot assess the simultaneous effects of all those
predictors.

For solutions see Chapters 4 and 5 of: Harrell (2001).If you have access to the
R Statistical package, you can employ the lasso  (Tibshirani, 1997) for
coefficient shrinkage, which is available in packages -glmpath-, -glmnet, and
-penalized-.

References:

Babyak, MA. 2004. What you see may not be what you get: a brief, nontechnical
introduction to overfitting in regression-type models. Psychosom Med 66,
no.3:411-421. http://www.psychosomaticmedicine.org/cgi/content-nw/full/66/3/411/

Harrell, Frank E. 2001. Regression modeling strategies : with applications to
linear models, logistic regression, and survival analysis. New York: Springer.

Tibshirani, R. 1997. The lasso method for variable selection in the Cox model.
Stat Med 16, no. 4: 385-395.

Steve
[email protected]

> On Jul 28, 2012, at 8:48 AM, Lee Savage wrote:
>
> The study is an analysis of government termination, estimated using a Cox
> proportional hazards model. The problem variable is 'Minority', this is a binary
> variable that indicates whether or not a government holds a parliamentary
> majority. The problem is that the standard error of the coefficient is extremely
> high. I have only seen this before when the coefficient was insignificant but in
> this case the coefficient is significant (as you can see below).
> Multicollinearity isn't a problem. I'm looking for advice on whether or not this
> is a problem or can I simply report the model and just state that the high SE of
> the 'Minority' variable means that it can't really be generalized?
>
> Here is the printout.
>
> Iteration 0:   log pseudolikelihood = -40.288812
> Iteration 1:   log pseudolikelihood = -28.304301
> Iteration 2:   log pseudolikelihood = -26.968036
> Iteration 3:   log pseudolikelihood = -26.902024
> Iteration 4:   log pseudolikelihood = -26.901788
> Refining estimates:
> Iteration 0:   log pseudolikelihood = -26.901788
>
> Cox regression -- Breslow method for ties
>
> No. of subjects      =           19                Number of obs   =       347
> No. of failures      =           19
> Time at risk         =          347
>                                                    Wald chi2(7)    =   1603.76
> Log pseudolikelihood =   -26.901788                Prob > chi2     =    0.0000
>
>                       Haz.     Robust
>                       Ratio    SE        z        P>z      [95% Conf Int
> Minority               77.01     56.61     5.91    0.00     18.23   325.28
> Ideology                0.84      0.20    -0.73    0.47      0.52    1.35
> formdays                0.94      0.03    -2.16    0.03      0.90    0.99
> nogovtpart~s            1.51      1.68     0.37    0.71      0.17    13.34
> ciep12                  1.36      1.25     0.34    0.74      0.23    8.21
> ConsNoCon               1.28      1.18     0.27    0.79      0.21    7.86
> tvc
> Unemployment            0.99      0.01    -2.31    0.02      0.98    1.00
> GDP                     1.00      0.00    2.24    0.03      1.00    1.00
> Inflation               0.98      0.01    -4.38    0.00      0.97    0.99
>
>
>
>
> __________________________
>
>
> From: Steve Samuels <[email protected]>
> To: [email protected]
> Sent: Friday, 27 July 2012, 21:30
> Subject: Re: st: Large standard error, Cox PH
>
>
> To answer your questions, we'd need more detail.  Describe the study and the
> problem variable in particular.
> As the FAQ request, "Say exactly what you typed and exactly what Stata typed (or
> did) in response".
>
> Steve
> [email protected]
>
>
>
>
> On Jul 27, 2012, at 2:20 PM, Lee Savage wrote:
>
> I have estimated a Cox PH model using a small sample (n=19, 347 months at risk).
> For one of my covariates I have found a large hazard ratio (77.01) with a
> correspondingly large standard error (56.61). I have seen this before but every
> time the covariate was insignificant, in the current model the covariate is
> significant (p=.001). I have tested the covariates for collinearity and
> everything looks fine. I think the probable cause is the small sample size.
>
>
> So my question is: is this a problem for my model overall model? My inclination
> is to report the model as it is and just state that the significant effect for
> the covariate in question should be treated with extreme caution, perhaps even
> ignored.
>
> I'd appreciate any advice on this.
>
> Thanks.
>
> *
> *   For searches and help try:
> *  http://www.stata.com/help.cgi?search
> *  http://www.stata.com/support/statalist/faq
> *  http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *  http://www.stata.com/help.cgi?search
> *  http://www.stata.com/support/statalist/faq
> *  http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *  http://www.stata.com/help.cgi?search
> *  http://www.stata.com/support/statalist/faq
> *  http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*  http://www.stata.com/help.cgi?search
*  http://www.stata.com/support/statalist/faq
*  http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```