[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Steichen, Thomas J." <SteichT@rjrt.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: help needed on discrete-time hazard model |

Date |
Thu, 18 Oct 2007 16:19:59 -0400 |

I see nothing wrong with the data generation steps you performed, so the question is whether this model makes sense. First, I will speculate that you have brand-specific prices at the time of each wave. Since cigarette prices tend to rise fairly uniformly between brands over time, either due to manufacturer price increases due to inflation or government tax increases, there is almost certainly a meaningful correlation between wave and price. Thus, having both a "price" variable and one or more "wave" variables will lead to confusion in the coefficients. In this model, the "wave2" variable can be thought of as estimating the average quit rate differential from the missing wave (wave 1)... and this includes an average price differential effect. Likewise, "wave3" estimates the average quit rate differential of wave 3 from wave 1. So what does "price" itself estimate in this model? I'd speculate it really only estimates how specific brands affect quitting. In your logit model, I'd guess that it indicates that subjects who smoke higher-than-average-priced brands quit at a lower rate. Said differently, those who smoke low-priced brands are more likely to quit due to a price increase. However, without knowing exactly what your variables represent, I can't go beyond speculation. I'm less clear why it remains negative when you take the wave variables out. If real, it implies that price differential (if it truly has a positive effect on quitting) wasn't great enough to overcome other, competing but correlated issues (not explained by any other variable in the model)that caused smokers to continue smoking during this time period. If so, price represents the increase in ALL of these issues and the ones for continued smoking dominated the result. On a different issue, using or not using the svy: prefix should change the estimated coefficients, so no particular importance should be placed on the fact that a coefficient changed signs between these two. Without the prefix, you are estimating what happened for the specific group of subjects surveyed in this study. When you add the weighting via the svy: prefix, you change the importance of those individual subjects based on their sampling weights. For example, you may have surveyed specific subjects who quit but represent only a very, very small part of the overall population. If you don't use the survey weights, their behavior may have a large effect on the sample results but little effect on the population results, even to the point of sign reversal. On yet another issue, marking pattern SQS as a successful "quit" seems possibly misleading. Clearly, if price continued to rise over the time period between waves (which seems likely to me), prices were higher in wave 3 than wave 2, yet these individuals started smoking again. This seems to suggest that price was not the most important motivating factor for quiting in wave 2 (or restatring in wave 3). One can argue that you should code these subjects as at "risk" for all three waves and as failing to quit. Tom -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Lili Yan Sent: Thursday, October 18, 2007 2:25 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: help needed on discrete-time hazard model Hi Thomas, Thank you very much for helping out! I know little about this model, so I thought the two zeros indicate something wrong in the data. The e(N) is correct, which I am sure. Here are some codes of setting up the data. I need explain first that smok_stat = 1 for SSS, 2 for SSQ, 3 for SQS and 4 for SQQ. ................codes start here................ gen smk_time=3 if smok_stat==1 | smok_stat==2; replace smk_time=2 if smok_stat==3 | smok_stat==4; gen cessyear=2004 if smok_stat==1; replace cessyear=2004 if smok_stat==2; replace cessyear=2003 if (smok_stat==3 | smok_stat==4); expand smk_time; bysort uniqid: gen seqvar=_n; bysort uniqid: gen qtsmok=smok_stat>1 & _n==_N; bysort uniqid: gen evntyear=cessyear; replace evntyear=2002 if seqvar==1; replace evntyear=2003 if seqvar==2; drop cessyear; rename evntyear cessyear; gen wave=1 if cessyear==2002; replace wave=2 if cessyear==2003; replace wave=3 if cessyear==2004; gen wave1=wave==1; gen wave2=wave==2; gen wave3=wave==3; svy: logit qtsmok male age married white mdrt_educ high_educ incm_mdrt incm_high canada rPSPPPi wave2 wave3, noconstant ...............codes end here.......... Here is the output: ..............output starts here................ Survey: Logistic regression Number of strata = 26 Number of obs = 5642 Number of PSUs = 5642 Population size = 5773.9291 Design df = 5616 F( 12, 5605) = 166.35 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized qtsmok | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | -.1715913 .1273081 -1.35 0.178 -.4211643 .0779817 age | -.0326805 .0053098 -6.15 0.000 -.0430898 -.0222713 married | .0156776 .1427494 0.11 0.913 -.2641663 .2955215 white | -.5607068 .1443603 -3.88 0.000 -.8437088 -.2777048 mdrt_educ | -.0291425 .1441877 -0.20 0.840 -.3118061 .2535212 high_educ | .5113156 .1800797 2.84 0.005 .1582899 .8643414 incm_mdrt | -.0339146 .1557743 -0.22 0.828 -.3392925 .2714632 incm_high | .1405313 .1766122 0.80 0.426 -.2056968 .4867595 canada | 1.802811 .2552666 7.06 0.000 1.30239 2.303233 rPSPPPi | -.0083975 .000842 -9.97 0.000 -.0100481 -.0067468 wave2 | 2.111112 .1326945 15.91 0.000 1.850979 2.371244 wave3 | 2.411039 .1389374 17.35 0.000 2.138668 2.68341 ------------------------------------------------------------------------------ ....................output ends here.............. The rPSPPPi is our price variable. We have more price variables but logit results with them are similar to what reported here. Thank you very much! Lili On 10/18/07, Steichen, Thomas J. <SteichT@rjrt.com> wrote: > Why do you consider this an indication of something wrong? > > Having zero completely determined successes e(N_cds) and failures > e(N_cdf) is what you prefer. > > Is your overall # of records e(N) wrong? > > Show us some sample commands and output so we can see what you are doing. > > > -----Original Message----- > > I checked the data just now. After running logit model with our > dependent variable, the stored results show: > > e(N) = 5463 > e(N_cds) = 0 > e(N_cdf) = 0 > > So seems there is something wrong in the data setup. Could anyone > please give me some help? > > > ----------------------------------------- > CONFIDENTIALITY NOTE: This e-mail message, including any > attachment(s), contains information that may be confidential, > protected by the attorney-client or other legal privileges, and/or > proprietary non-public information. If you are not an intended > recipient of this message or an authorized assistant to an intended > recipient, please notify the sender by replying to this message and > then delete it from your system. Use, dissemination, distribution, > or reproduction of this message and/or any of its attachments (if > any) by unintended recipients is not authorized and may be > unlawful. > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: help needed on discrete-time hazard model***From:*"Lili Yan" <lyan16@gmail.com>

**References**:**st: help needed on discrete-time hazard model***From:*Lili Yan <lyan16@gmail.com>

**Re: st: help needed on discrete-time hazard model***From:*"Lili Yan" <lyan16@gmail.com>

**RE: st: help needed on discrete-time hazard model***From:*"Steichen, Thomas J." <SteichT@rjrt.com>

**Re: st: help needed on discrete-time hazard model***From:*"Lili Yan" <lyan16@gmail.com>

- Prev by Date:
**st: cross-sectional (retarded) averages** - Next by Date:
**st: RE: allocating more memory to stata 10** - Previous by thread:
**Re: st: help needed on discrete-time hazard model** - Next by thread:
**Re: st: help needed on discrete-time hazard model** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |