Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: STCOX: Explanators That Vary Monotonically With Analysis Time


From   Adam_Thomas@ksgphd.harvard.edu
To   statalist@hsphsun2.harvard.edu
Subject   st: STCOX: Explanators That Vary Monotonically With Analysis Time
Date   Wed, 21 Feb 2007 13:47:19 -0500

Hello,

I've got a question regarding the use of Cox regression with explanators
that vary monotonically with analysis time.  I am using the National
Longitudinal Survey of Youth to look at the effect of having a prison
record on one's hazard of first marriage.  If possible, I would like
conduct some analyses for a sample that is limited to observations that are
incarcerated at some point during the panel.  But I'm getting some very
strange results when I do so.  Here are the results for the entire sample
(I cleaned them up a bit to make them a little more readable).  The
independent variable of interest is called "everjail," and is set equal to
zero for person-years who have not yet gone to jail and to one for
observations that have been incarcerated.  Analysis time is measured here
in terms of months of age.  The coefficient on "everjail" is LT one and
statistically significant.  This result persists when I use a number of
different combinations of control variables, when I limit my sample to
self-reported juvenile delinquents, and in a number of other settings:

#delimit;
capture drop sch*;
capture drop sca*;
xi: stcox   newern cumern jail everjail alcoholicparent badparent
i.stateres
delinquent1 south urate rural everkids i.year AFQT i.edcatrev relighome
  if race == 2 & varuse != . &  sampid < 15 & sampid != 9,
     robust schoenfeld(sch*) scaledsch(sca*);
     stphtest, detail;


Cox regression -- Breslow method for ties

No. of subjects      =    196630900                Number of obs   =
12493
No. of failures      =     85969633
Time at risk         =   1937511024
                                                   Wald chi2(69)   =
.
Log pseudolikelihood =   -3515.0499                Prob > chi2     =
.

                              (Std. Err. adjusted for 1281 clusters in
caseid)
------------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf.
          Interval]
-------------+----------------------------------------------------------------

      newern |    1.00058   .0000778     7.45   0.000     1.000427
      1.000732
      cumern |   1.020076   .0021842     9.28   0.000     1.015804
      1.024366
        jail |     .40921   .1474272    -2.48   0.013     .2019676
    .8291077
    everjail |   .5976148   .1435622    -2.14   0.032     .3731996
    .9569772
alcoholicp~t |   1.365126   .1719276     2.47   0.013     1.066522
   1.747331
   badparent |   1.181243    .133375     1.48   0.140     .9467374
   1.473836
delinquent1 |   .9686306   .0948547    -0.33   0.745     .7994714
       1.173582
       south |   7.391707   6.974655     2.12   0.034     1.162972
       46.98076
       urate |   1.022676    .019999     1.15   0.252     .9842206
       1.062634
       rural |   1.105722   .1783717     0.62   0.533     .8059955
       1.516908
    everkids |   1.316987   .1340499     2.71   0.007     1.078802
        1.60776
        AFQT |   .9999217   .0002457    -0.32   0.750     .9994402
        1.000403
_Iedcatrev_2 |   1.214536   .1669344     1.41   0.157     .9277167
1.590031
_Iedcatrev_3 |   1.156426   .1918056     0.88   0.381     .8354816
1.600658
   relighome |   .7565323   .1801382    -1.17   0.241     .4744032
1.206445
------------------------------------------------------------------------------


.                                         stphtest, detail;
      Test of proportional hazards assumption
      Time:  Time
      ----------------------------------------------------------------
                  |       rho            chi2       df       Prob>chi2
      ------------+---------------------------------------------------
      newern      |      0.03542         0.71        1         0.3997
      cumern      |      0.00619         0.03        1         0.8579
      jail        |      0.00360         0.01        1         0.9184
      everjail    |      0.06603         3.64        1         0.0565
      alcoholicp~t|     -0.00158         0.00        1         0.9665
      badparent   |      0.04819         1.56        1         0.2116
      delinquent1 |     -0.01933         0.31        1         0.5757
      south       |     -0.00161         0.00        1         0.9671
      urate       |      0.01949         0.30        1         0.5864
      rural       |      0.08032         5.01        1         0.0251
      everkids    |     -0.05329         2.42        1         0.1195
      AFQT        |     -0.02263         0.37        1         0.5405
      _Iedcatrev_2|     -0.00972         0.08        1         0.7794
      _Iedcatrev_3|      0.01125         0.10        1         0.7510
      relighome   |     -0.00616         0.03        1         0.8547
      ------------+---------------------------------------------------
      global test |                     66.57       79         0.8394
      ----------------------------------------------------------------
note: robust variance-covariance matrix used.



Next, I limit my sample only to persons who go to prison at some point
during the panel, so the comparison group for those with a prison record at
any given point in time consists of a group that has not yet gone to prison
but will at some point in the future.  Bear in mind that, in this sample,
all persons' everjail values will switch from zero to one at some point
during the panel and will then remain one for the duration of the panel.
This specification attenuates the estimated effect of everjail and the
parameter is no longer significant.  However, the Schoenfeld residual
analysis also suggests that the effect of everjail is not proportionally
constant over time:


#delimit;
capture drop sch*;
capture drop sca*;
xi: stcox   newern cumern jail everjail alcoholicparent badparent
i.stateres
delinquent1 south urate rural everkids i.year AFQT i.edcatrev relighome
  if race == 2 & varuse != . &  sampid < 15 & sampid != 9 & truejail == 1,
     robust schoenfeld(sch*) scaledsch(sca*);
     stphtest, detail;

Cox regression -- Breslow method for ties

No. of subjects      =     37925698                Number of obs   =
2839
No. of failures      =     10391186
Time at risk         =  423487389.6
                                                   Wald chi2(65)   =
.
Log pseudolikelihood =   -301.42336                Prob > chi2     =
.

                               (Std. Err. adjusted for 259 clusters in
caseid)
------------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf.
          Interval]
-------------+----------------------------------------------------------------

      newern |    .994557   .0145889    -0.37   0.710     .9663705
      1.023566
      cumern |   1.024386   .0099382     2.48   0.013     1.005092
      1.044051
        jail |   .4558718     .22238    -1.61   0.107      .175233
    1.185959
    everjail |   .7078746   .2519604    -0.97   0.332     .3523548
    1.422108
alcoholicp~t |   1.212183   .3920952     0.59   0.552     .6430384
   2.285071
   badparent |   1.278569   .5155408     0.61   0.542     .5801032
   2.818014
delinquent1 |    .731428   .2461317    -0.93   0.353     .3782118
       1.414517
       south |   2.175701   2.197618     0.77   0.442     .3004853
       15.75343
       urate |   1.066808   .0537228     1.28   0.199     .9665423
       1.177474
       rural |   1.429138   .7232303     0.71   0.480     .5300472
       3.853307
    everkids |   1.963302   .6986453     1.90   0.058     .9774288
        3.943565
        AFQT |   .9988161    .001441    -0.82   0.412     .9959957
        1.001644
_Iedcatrev_2 |    1.08698   .3549441     0.26   0.798     .5731508
2.061456
_Iedcatrev_3 |   1.350611   .6145295     0.66   0.509     .5536468
3.294792
   relighome |   .6680439   .4299661    -0.63   0.531     .1892148
2.358603
------------------------------------------------------------------------------


.                                         stphtest, detail;
      Test of proportional hazards assumption
      Time:  Time
      ----------------------------------------------------------------
                  |       rho            chi2       df       Prob>chi2
      ------------+---------------------------------------------------
      newern      |     -0.07667         0.88        1         0.3493
      cumern      |      0.14270         3.68        1         0.0552
      jail        |      0.06943         1.33        1         0.2495
      everjail    |      0.15710         5.30        1         0.0214
      alcoholicp~t|     -0.07257         0.93        1         0.3338
      badparent   |     -0.12917         6.45        1         0.0111
      delinquent1 |      0.15798         7.93        1         0.0049
      south       |     -0.01537         0.03        1         0.8584
      urate       |     -0.04046         0.27        1         0.6043
      rural       |      0.17582         7.12        1         0.0076
      everkids    |     -0.05883         1.30        1         0.2539
      AFQT        |     -0.15273         6.03        1         0.0141
      _Iedcatrev_2|      0.00988         0.03        1         0.8561
      _Iedcatrev_3|      0.15509         4.09        1         0.0433
      relighome   |     -0.02351         0.17        1         0.6773
      ------------+---------------------------------------------------
      global test |                     55.46       69         0.8810
      ----------------------------------------------------------------
note: robust variance-covariance matrix used.




After poking around a bit, I discovered that I got very different
coefficients depending on the age of the respondents I was looking at
(recall that my analysis time is measured in terms of respondents' ages).
As an example, I split the panel roughly in half below.  There is a
negative and statistically significant estimated effect of past
incarceration for younger observations:


#delimit;
capture drop sch*;
capture drop sca*;
xi: stcox   newern cumern jail everjail alcoholicparent badparent
i.stateres
delinquent1 south urate rural everkids i.year AFQT i.edcatrev relighome
  if race == 2 & varuse != . &  sampid < 15 & sampid != 9 & agemon < 27,
     robust schoenfeld(sch*) scaledsch(sca*);
     stphtest, detail;

Cox regression -- Breslow method for ties

No. of subjects      =     37925698                Number of obs   =
1481
No. of failures      =      6287322
Time at risk         =  224290651.6
                                                   Wald chi2(40)   =
.
Log pseudolikelihood =   -178.62156                Prob > chi2     =
.

                               (Std. Err. adjusted for 259 clusters in
caseid)
------------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf.
          Interval]
-------------+----------------------------------------------------------------

      newern |    .993368   .0152892    -0.43   0.666     .9638493
      1.023791
      cumern |   1.017118   .0108817     1.59   0.113     .9960119
      1.038671
        jail |   .3177393   .1980827    -1.84   0.066     .0936313
    1.078253
    everjail |   .2985212   .1799312    -2.01   0.045     .0916053
    .972814
alcoholicp~t |   .8084521    .345783    -0.50   0.619     .3496125
   1.869484
   badparent |   2.041492   1.142844     1.27   0.202     .6814562
   6.115855
 delinquent1 |   .4215357   .1594842    -2.28   0.022     .2008122
       .8848683
       south |    1.20106   1.361239     0.16   0.872     .1302696
       11.07354
       urate |   1.015647    .085447     0.18   0.854     .8612531
       1.197719
       rural |    .703451   .4031194    -0.61   0.539      .228794
       2.162833
    everkids |   1.871234   .8486668     1.38   0.167     .7692722
        4.551728
        AFQT |   1.000239   .0017253     0.14   0.890     .9968636
        1.003627
_Iedcatrev_2 |   1.041943   .4474091     0.10   0.924     .4490959
2.417402
_Iedcatrev_3 |   1.291526   .7877315     0.42   0.675     .3907832
4.268454
   relighome |   2.116199   2.446491     0.65   0.517     .2195336
20.39914
------------------------------------------------------------------------------


.                                         stphtest, detail;
      Test of proportional hazards assumption
      Time:  Time
      ----------------------------------------------------------------
                  |       rho            chi2       df       Prob>chi2
      ------------+---------------------------------------------------
      newern      |     -0.10701         1.22        1         0.2688
      cumern      |      0.03699         0.23        1         0.6318
      jail        |     -0.15441         4.74        1         0.0294
      everjail    |     -0.07598         0.76        1         0.3833
      alcoholicp~t|     -0.04136         0.20        1         0.6568
      badparent   |      0.01651         0.08        1         0.7782
      delinquent1 |      0.09358         1.40        1         0.2373
      south       |     -0.14422         2.63        1         0.1046
      urate       |     -0.26956        16.12        1         0.0001
      rural       |     -0.04021         0.22        1         0.6358
      everkids    |     -0.15353         7.05        1         0.0079
      AFQT        |     -0.16024         6.05        1         0.0139
      _Iedcatrev_2|      0.14036         5.18        1         0.0228
      _Iedcatrev_3|      0.14445         3.39        1         0.0656
      relighome   |      0.16377         6.82        1         0.0090
      ------------+---------------------------------------------------
      global test |                     49.93       60         0.8197
      ----------------------------------------------------------------
note: robust variance-covariance matrix used.



But - and this is where I think something is wrong - there is a very large
*positive* and statistically significant coefficient for older
observations:


#delimit;
capture drop sch*;
capture drop sca*;
xi: stcox   newern cumern jail everjail alcoholicparent badparent
i.stateres
delinquent1 south urate rural everkids i.year AFQT i.edcatrev relighome
  if race == 2 & varuse != . &  sampid < 15 & sampid != 9 & agemon < 27,
     robust schoenfeld(sch*) scaledsch(sca*);
     stphtest, detail;

Cox regression -- Breslow method for ties

No. of subjects      =     29698165                Number of obs   =
1358
No. of failures      =      4103864
Time at risk         =    199196738
                                                   Wald chi2(39)   =
.
Log pseudolikelihood =   -86.623654                Prob > chi2     =
.

                               (Std. Err. adjusted for 203 clusters in
caseid)
------------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf.
          Interval]
-------------+----------------------------------------------------------------

      newern |    .967959   .0494768    -0.64   0.524     .8756854
      1.069956
      cumern |   1.057655   .0343948     1.72   0.085      .992346
      1.127262
        jail |    5.22185   3.696766     2.33   0.020     1.303836
    20.91345
    everjail |   8.839575   6.892794     2.79   0.005     1.917317
    40.75387
alcoholicp~t |   3.424662   2.921447     1.44   0.149     .6434142
   18.22824
   badparent |   .6894594   .4592443    -0.56   0.577     .1868655
   2.543831
 delinquent1 |   2.242574   1.290387     1.40   0.160     .7260416
       6.926791
       south |   6.50e+10   9.17e+10    17.66   0.000     4.10e+09    1.03
       e+12
       urate |     1.1123   .1875627     0.63   0.528     .7992581
       1.547949
       rural |   3.660132   3.326824     1.43   0.153     .6163246
       21.73622
    everkids |   2.251156   1.416954     1.29   0.197     .6555872
        7.730023
        AFQT |   .9979546   .0043069    -0.47   0.635     .9895488
        1.006432
_Iedcatrev_2 |   .7028138   .5085406    -0.49   0.626     .1701883
2.902358
_Iedcatrev_3 |   .7220283   .8786676    -0.27   0.789     .0664799
7.841847
   relighome |   .2068565   .1644372    -1.98   0.047     .0435532
.9824666
------------------------------------------------------------------------------


.                                         stphtest, detail;
      Test of proportional hazards assumption
      Time:  Time
      ----------------------------------------------------------------
                  |       rho            chi2       df       Prob>chi2
      ------------+---------------------------------------------------
      newern      |     -0.02494         0.10        1         0.7494
      cumern      |      0.00947         0.03        1         0.8672
      jail        |      0.00265         0.00        1         0.9678
      everjail    |     -0.01087         0.02        1         0.8807
      alcoholicp~t|     -0.07177         1.43        1         0.2322
      badparent   |     -0.08120         1.64        1         0.2007
      delinquent1 |     -0.07520         1.21        1         0.2712
      south       |     -0.15235         3.86        1         0.0494
      urate       |      0.12274         5.78        1         0.0162
      rural       |      0.08422         1.69        1         0.1933
      everkids    |     -0.20486        11.32        1         0.0008
      AFQT        |      0.11702         5.16        1         0.0231
      _Iedcatrev_2|      0.04897         0.85        1         0.3570
      _Iedcatrev_3|      0.01366         0.07        1         0.7863
      relighome   |      0.02262         0.12        1         0.7268
      ------------+---------------------------------------------------
      global test |                     47.49       57         0.8112
      ----------------------------------------------------------------
note: robust variance-covariance matrix used.


This suggests to me that something isn't working as I'd expect for this
particular specification.  By the way, I encounter roughly similar problems
if, rather than splitting up the sample, I use time-varying covariates, or
if I include a "goes to prison at some point during the panel" control
dummy variable rather than simply limiting the sample to this group.  But
the problem ONLY occurs if I limit the sample to those who go to prison at
some point during the panel.  If I look at results for older observations
in the sample as a whole, the parameter estimate on everjail is
correctly-signed.  It may be that having gone to prison has no effect on
one's probability of marrying, but I find it very hard to believe that it
has a strongly *positive* effect for older people.  Here's my theory, which
I'm hoping I can get some reaction to: both my dependent variable (the
hazard of first marriage) and my key independent variable (everjail) are
strongly positively correlated with analysis time (age).  This is true for
obvious reasons for the dependent variable, and it is true for everjail
since the sample is limited to observations who go to prison at some point
during the panel - if you are in this subsample and haven't gone to jail
today, you're likely to do so next year, and if you don't do so next year,
you're certain to do sometime after that, and so forth.  So, since both
variables are positively correlated with analysis time, and since the
"older" sample is limited to people who are guaranteed to have some years
in which everjail == 1 (members of the younger sample may not go to prison
until they are older) and a certain this percentage of whom are also going
to marry, is age confounding the relationship between everjail and the
hazard of marrying?  It doesn't seem like this should be possible, since
age - my measure of analysis time - is explicitly being controlled for in
the baseline hazard.

So, my basic question is this: if I'm using a Cox model and have an
independent variable that - like the dependent variable in a Cox analysis -
varies monotonically with analysis time, does that introduce some sort of
strange timing issue into the analysis?  Should I expect to get odd
parameter estimates in a situation like this, or am I doubting my results
when in fact I shouldn't be?  I'm stumped, so any and all advice would be
most welcome!!

Cheers,
Adam Thomas
John F. Kennedy School of Government,
Harvard University





*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index