Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: -cloglog- memory & -stcurve- median : was -svy stocx- attained age - Modified Version


From   "Muhuri, Pradip (SAMHSA/CBHSQ)" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: RE: -cloglog- memory & -stcurve- median : was -svy stocx- attained age - Modified Version
Date   Thu, 8 Aug 2013 04:04:24 +0000

Hello Statalist,

I have further modified my message for clarity, without going into too much detail. My apologies for reposting.

The issue is that -svy:cloglog- provides exp(b)'s that are very different from the corresponding results of the -svy:stcox- for the same exact model, which I have estimated.

Study Objectives: To examine premature mortality from all causes combined and selected causes of death in relation to serious psychological distress (SPD) and cigarette smoking among adults in the U.S. household population, adjusting for chronic health conditions and other statistical controls.

Data: Public use files of the 1997 through 2006 National Health Interview Surveys that are linked to the 1997-2006 death records in the National Death Index.

Date variables (date of interview and date of death): Only the quarter and the year are used; the day is imputed as the middle of the quarter (e.g., 15th Feb, 15th May, 15th Aug, 15th Nov).  Survival times are measured in terms of both analysis times and attained ages at death or censoring on December 31, 2006.

The dependent variable, here, is all-cause death. 

Output from -svy:stcox -, with attained age (18 to 95 years) as the time scale. 
---------------------------------------
                Variable |     m1      
-------------------------+-------------
                  Female |    0.59***  
    Serious Psy Distress |    1.47***  
          Current Smoker |    2.09***  
           Former Smoker |    1.29***  
                 Div/Sep |    1.40***  
                   Widow |    1.30***  
           Never Married |    1.59***  
                Hispanic |    0.94     
                NH Black |    1.21***  
                NH Other |    0.71***  
             Hi Sch Grad |    0.83***  
            College Grad |    0.63***  
             Underweight |    1.64***  
              Overweight |    0.79***  
                   Obese |    0.87***  
             1 Condition |    1.51***  
           2+ Conditions |    2.58***  
-------------------------+-------------
                       N |  227925     
---------------------------------------
  legend: * p<.05; ** p<.01; *** p<.001



In an earlier reply, Steve commented that the basic assumption of the Cox proportional hazards model is violated because the time-on-study is not continuous and suggested that I try cumulative log log models.
             
Output from -svy:cloglog- (The time from interview to death or censoring ranges from 0 to 39 quarter; 0 lumped with 1 quarter when calculating the survival time. Produced the person-quarter-year data file using the -expand- command)
-----------------------------------------------------------------
                Variable |     m1           m2           m3      
-------------------------+---------------------------------------
                  Female |    0.54***      0.54***      0.54***  
    Serious Psy Distress |    1.09*        1.09*        1.09     
          Current Smoker |    1.11***      1.11***      1.11***  
           Former Smoker |    1.47***      1.48***      1.47***  
                 Div/Sep |    1.30***      1.31***      1.31***  
                   Widow |    3.74***      3.80***      3.80***  
           Never Married |    0.61***      0.60***      0.60***  
                Hispanic |    0.63***      0.63***      0.63***  
                NH Black |    0.96         0.97         0.97     
                NH Other |    0.54***      0.54***      0.54***  
             Hi Sch Grad |    0.59***      0.59***      0.59***  
            College Grad |    0.42***      0.42***      0.42***  
             Underweight |    1.75***      1.76***      1.76***  
              Overweight |    0.69***      0.69***      0.69***  
                   Obese |    0.57***      0.57***      0.57***  
             1 Condition |    2.78***      2.79***      2.79***  
           2+ Conditions |    6.85***      6.97***      6.98***  
                     lnj |                 1.19***               
                      j2 |                              1.00***  
                Constant |    0.00***      0.00***      0.00***  
-------------------------+---------------------------------------
                       N | 5249084      5249084      5249084     
-----------------------------------------------------------------
                            legend: * p<.05; ** p<.01; *** p<.001
Note: j ranges from 1 to 39 quarters.

 
The -cloglog- models are estimated using the data file that includes person-quarter-year observations. 

Acknowledgements: (http://www.iser.essex.ac.uk/survival-analysis ), and Steve.  Thanks.

The above online source shows that results from -cloglog- models are very similar to the results from parametric survival models. But, as shown in the above tables, my results from -stcox- and -cloglog- models are very different. Should not the results be similar?

Are there any resources available showing comparisons of the same model estimated via -stcox- and -cloglog-? 

Any help toward resolving the issue would be appreciated.

Thanks,

Pradip



Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Steve Samuels
Sent: Wednesday, August 07, 2013 12:48 PM
To: [email protected]
Subject: Re: st: RE: -cloglog- memory & -stcurve- median : was -svy stocx- attained age - issues with posting


Pradip:

Examination of the Statalist archives (www.stata.com/statalist/archive/)
shows two messages from you in July and four in August prior to today's.
I received them all locally.  On my system, sent messages rarely  show up in my inbox. Perhaps something like that is giving you the impression that your messages don't get through.

My reference to "personal conversation" was to your last message in which you have addressed your questions to me alone, without explanation or background that would enable others to assist you. You had sent some of that information privately to me. I  asked you twice to send it to Statalist, and you did not. I post a portion below.

It is also not helpful that you bundle unrelated questions in one post with a subject line that doesn't refer to either. These practices will certainly limit the number of potential responders.

I suggest that you now read the FAQ (esp Section 3), as you were asked to do when you first joined the list.

You have a difficult project. It's easy enough to identify potential problems, but I regret that I don't often have ready-made solutions.


Steve


Portions of Pradip's private post to me (July 27):

1)       Objectives: To examine premature mortality from all causes combined and selected causes of death in relation to serious psychological distress (SPD) and cigarette smoking among adults in the U.S. household population, adjusting for chronic health conditions and other statistical controls.

2)       Data: Public use files of the 1997 through 2006 National Health Interview Surveys that are linked to the 1997-2006 death records in the National Death Index.

3)       Date variables (date of interview and date of death): Only the quarter and the year are known; the day is imputed as the middle of the quarter (e.g., 15th Feb, 15th May, 15th Aug, 15thNov).  Survival times are measured in terms of both analysis times and attained ages at death or censoring on December 31, 2006.

Analysis time (years) = time from interview to death or censoring [time_yr = ceil( (interval/365.25) ] Attained age (years) = Analysis time + age at interview (both in years).


> On Aug 7, 2013, at 11:07 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> 
> Hello,
> 
> I am new to the Statalist.
> 
> I still don't understand why my earlier messages except two of them did get posted to the Statalist.  Our IT service desk has confirmed that my all messages have left our domain successfully and that they reached the recipient domain (i.e., Statalist).
> 
> Although those e-mails did not get posted to the Statlist, the recent two e-mails reached Steve's address.  My apologies to him for any inconvenience this may have caused.  However, I appreciated receiving his excellent response on the issues I have been dealing with.
> 
> Any help toward resolving the issue would be appreciated.
> 
> Thanks,
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Steve 
> Samuels
> Sent: Tuesday, August 06, 2013 5:26 PM
> To: [email protected]
> Subject: st: -cloglog- memory & -stcurve- median : was -svy stocx- 
> attained age
> 
> Pradip:
> 
> Statalist is not a place for private conversations. Nobody will be able to follow this thread unless you are clear about what you are doing.
> 
> In my response to your private email, I asked you to send your original private post to Statalist. Please now describe the study, data, and study questions so that others might help you.
> 
> For specific issues, show code and results as the Statalist FAQ request, detail the models you tried, the relevant output, and give essential information, such as the N, number of deaths, descriptive stats on your primary exposure variable, and a 2 x 2 table cross-tabulating death and exposure.
> 
> To bring others up to date, the d* and dur* variables Pradip refers to 
> are period variables for -cloglog-, as detailed in Lesson 6 of 
> https://www.iser.essex.ac.uk/resources/survival-analysis-with-stata
> The "spell" dataset, is the result of an -expand- operation which contains one observation for each period a person is at risk:
> 
> About memory problem, I have only generic advice:
> 
> . Upgrade your OS to one that can address more memory
> 
> . Add physical memory
> 
> As we know more details, other ideas may suggest themselves.
> 
> To estimate the median, compute it "by hand" for a single curve; then try to write code that will automate what you did. If you can't, "by hand" may be good enough. Just be aware that, because of the grouping, you'll need to linearly interpolate between quarter end points or to fit flexible parametric models. I pointed the way to creating expected values in an earlier post. For a way of adding points to a graph, see:
> http://www.stata.com/statalist/archive/2008-02/msg01145.html
> 
> By the way: you are not correct in assuming that date of interview can be located only by year and quarter. The NHIS data sets contain an "assignment week" of the quarter, and instructions are to finish the survey by that week, or no later than a bit longer than 14 days. Thus you can easily locate interview date to the most likely month, that of the assignment week. I'm not sure that will you do any good unless you can work with the restricted data that has date of death.
> 
> See:
> http://www.amstat.org/sections/srms/proceedings/papers/1992_048.pdf)
> http://www.amstat.org/sections/srms/proceedings/y2001/Proceed/00040.pd
> f
> 
> You are welcome for the advice I've given so far, but to quote a recent post from Nick Cox:
> 
> "Better to think that you are replying to the list. Anyone who replies to you is not volunteering to provide dedicated support, just trying to push a discussion forward. Nor will they necessarily write lots of code for you...What you want requires custom programming..."
> 
> Regards,
> 
> Steve
> 
> 
> 
> On Aug 5, 2013, at 3:54 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> 
> Steve, Thank you so much for your continued support, advice, and excellent reference materials (Thanks to Professor Jenkins for making his materials available online). My apologies for e-mail transmission problems, source yet to be ascertained. Since my recent e-mail did also not get posted, I am resending it, with no copying or pasting this time.
> 
> I would like to run models with both -svy stcox- and -svy cloglog- for comparison purposes. But, I would like to report results from -cloglog- models.
> 
> Issues:
> 
> 1) -cloglog-: The spell-quarter data file is created (compressed -
> 5,485,946 obs 70 vars including d1-d39 dummies and dur1-dur10 variables, with -set memory 725m - can't go beyond this limit). Failed to run models due to memory issue. Any advice?
> 
> 2) -syv:stcox-: Plotted the survival curve [...at1() ... at2 ()] and then saved the results in a .dta file that provides me three variables (_t, surv2, and surv3). I am looking for the sample code to calculate the following:
> 
> 	(a) the median survival time, that is, at at with (_t) half of the
> people survived (based 	on surv2 and surv3); and
> 
> 	(b) the arithmetic mean of the survival time (_t) or life expectancy.
> 
> Your advice toward resolving the issues will be highly appreciated.
> 
> Regards,
> 
> Pradip
> 
> Sent: Wednesday, July 31, 2013 6:30 PM To:
> [email protected] Subject: Re: st: -svy stocx- attained 
> age
> 
> Pradip:
> 
> You are encountering a version of problem that I diagnosed earlier 
> this month at 
> http://www.stata.com/statalist/archive/2013-07/msg00644.html,
> where I referred to
> http://www.stata.com/support/faqs/statistics/stcox-producing-missing-
> standard-errors/ "4) Covariate does not vary within death event risk sets."
> 
> In your case, people in the same age group at baseline will have the same attained age at all points of followup. You can use either age at baseline or time as attained age, but not both.
> 
> I'd recommend age at baseline, as you know this pretty precisely, a 
> gain that doesn't transfer to attained age. But the use of only five 
> age groups loses a lot of information; try fractional polynomial 
> regression
> (-fp-) on continuous age.
> 
> In a post to me, you stated that you have a maximum of 10 years of follow-up and, that for dates of death and interview, you have only year and quarter. Your solution is to assign the midpoint date of each quarter. This could work, but violates the assumption of -stcox- that times are essentially continous. The measurement error in follow-up time could be as much as ±3 months and will probably bias the estimated coefficients. Moreover, if someone died in < 3 months after baseline, you would be assigning start and death to the same date, and -stset- will drop the observation.
> 
> Therefore I suggest that you also do a grouped hazard analysis with
> -cloglog- which accepts a -svy- prefix. (-stpm2-, as you reminded me, does not.). With -cloglog-, assign the person who died < 3 months after baseline to the first period. For more details, see the the Lesson 6 link to discrete data analysis on Stephen Jenkins's fine web page "Survival analysis with Stata"
> (http://www.iser.essex.ac.uk/survival-analysis )
> 
> Steve
> 
> On Jul 31, 2013, at 12:53 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> 
> Hello,
> 
> I am new to the Statalist. In response to my first posted e-mail to the List, and I got a reply from Steve, with insightful comments and advice.
> At this point, I need your help with two issues.
> 
> 1) All my subsequent e-mails (content --copied from the Stata log file - in plain text, not html) have bounced back to me. I don't understand what I am doing wrong.
> 
> 2) I am using -svy:stcox- models, with attained age as the time scale. I have successfully run several models. However, the addition of a 5-category factor (attained age) to the model gives me the following error message: "flat region resulting in a missing likelihood error occurred when svy executed stcox last estimates not found". Sorry I am not pasting the content from the log file, fearing that this e-mail will also bounce back.
> 
> Thanks,
> 
> P
> 
> 
> On Aug 5, 2013, at 3:54 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> 
> Steve, Thank you so much for your continued support, advice, and excellent reference materials (Thanks to Professor Jenkins for making his materials available online). My apologies for e-mail transmission problems, source yet to be ascertained. Since my recent e-mail did also not get posted, I am resending it, with no copying or pasting this time.
> 
> I would like to run models with both -svy stcox- and -svy cloglog- for comparison purposes. But, I would like to report results from -cloglog- models.
> 
> Issues:
> 
> 1) -cloglog-: The spell-quarter data file is created (compressed -
> 5,485,946 obs 70 vars including d1-d39 dummies and dur1-dur10 variables, with -set memory 725m - can't go beyond this limit). Failed to run models due to memory issue. Any advice?
> 
> 2) -syv:stcox-: Plotted the survival curve [...at1() ... at2 ()] and then saved the results in a .dta file that provides me three variables (_t, surv2, and surv3). I am looking for the sample code to calculate the following:
> 
> 	(a) the median survival time, that is, at at with (_t) half of the
> people survived (based 	on surv2 and surv3); and
> 
> 	(b) the arithmetic mean of the survival time (_t) or life expectancy.
> 
> Your advice toward resolving the issues will be highly appreciated.
> 
> *   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index