[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Lili Yan" <lyan16@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: help needed on discrete-time hazard model |

Date |
Thu, 18 Oct 2007 20:40:50 -0400 |

Thanks a lot for your prompt response, Tom! On 10/18/07, Steichen, Thomas J. <SteichT@rjrt.com> wrote: > I see nothing wrong with the data generation steps you performed, > so the question is whether this model makes sense. > > First, I will speculate that you have brand-specific prices at > the time of each wave. Since cigarette prices tend to rise > fairly uniformly between brands over time, either due to > manufacturer price increases due to inflation or government tax > increases, there is almost certainly a meaningful correlation > between wave and price. Thus, having both a "price" variable and > one or more "wave" variables will lead to confusion in the > coefficients. > > In this model, the "wave2" variable can be thought of as estimating > the average quit rate differential from the missing wave (wave 1)... > and this includes an average price differential effect. Likewise, > "wave3" estimates the average quit rate differential of wave 3 from > wave 1. > > So what does "price" itself estimate in this model? I'd speculate > it really only estimates how specific brands affect quitting. > In your logit model, I'd guess that it indicates that subjects > who smoke higher-than-average-priced brands quit at a lower rate. > Said differently, those who smoke low-priced brands are more likely > to quit due to a price increase. However, without knowing exactly > what your variables represent, I can't go beyond speculation. > > I'm less clear why it remains negative when you take the wave > variables out. If real, it implies that price differential (if > it truly has a positive effect on quitting) wasn't great enough to > overcome other, competing but correlated issues (not explained by > any other variable in the model)that caused smokers to continue > smoking during this time period. If so, price represents the > increase in ALL of these issues and the ones for continued smoking > dominated the result. > > On a different issue, using or not using the svy: prefix should > change the estimated coefficients, so no particular importance > should be placed on the fact that a coefficient changed signs > between these two. Without the prefix, you are estimating what > happened for the specific group of subjects surveyed in this study. > When you add the weighting via the svy: prefix, you change the > importance of those individual subjects based on their sampling > weights. > > For example, you may have surveyed specific subjects who quit > but represent only a very, very small part of the overall population. > If you don't use the survey weights, their behavior may have > a large effect on the sample results but little effect on the > population results, even to the point of sign reversal. > > On yet another issue, marking pattern SQS as a successful "quit" > seems possibly misleading. Clearly, if price continued to rise > over the time period between waves (which seems likely to me), > prices were higher in wave 3 than wave 2, yet these individuals > started smoking again. This seems to suggest that price was not > the most important motivating factor for quiting in wave 2 (or > restatring in wave 3). One can argue that you should code these > subjects as at "risk" for all three waves and as failing to quit. > > Tom > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Lili Yan > Sent: Thursday, October 18, 2007 2:25 PM > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: help needed on discrete-time hazard model > > Hi Thomas, > > Thank you very much for helping out! > I know little about this model, so I thought the two zeros indicate > something wrong in the data. The e(N) is correct, which I am sure. > > Here are some codes of setting up the data. I need explain first that > smok_stat = 1 for SSS, 2 for SSQ, 3 for SQS and 4 for SQQ. > ................codes start here................ > > gen smk_time=3 if smok_stat==1 | smok_stat==2; > replace smk_time=2 if smok_stat==3 | smok_stat==4; > > gen cessyear=2004 if smok_stat==1; > replace cessyear=2004 if smok_stat==2; > replace cessyear=2003 if (smok_stat==3 | smok_stat==4); > > expand smk_time; > bysort uniqid: gen seqvar=_n; > bysort uniqid: gen qtsmok=smok_stat>1 & _n==_N; > > bysort uniqid: gen evntyear=cessyear; > replace evntyear=2002 if seqvar==1; > replace evntyear=2003 if seqvar==2; > drop cessyear; > rename evntyear cessyear; > > gen wave=1 if cessyear==2002; > replace wave=2 if cessyear==2003; > replace wave=3 if cessyear==2004; > > gen wave1=wave==1; > gen wave2=wave==2; > gen wave3=wave==3; > > svy: logit qtsmok male age married white mdrt_educ high_educ incm_mdrt > incm_high canada rPSPPPi wave2 wave3, noconstant > ...............codes end here.......... > > Here is the output: > > ..............output starts here................ > Survey: Logistic regression > > Number of strata = 26 Number of obs = 5642 > Number of PSUs = 5642 Population size = 5773.9291 > Design df = 5616 > F( 12, 5605) = 166.35 > Prob > F = 0.0000 > > ------------------------------------------------------------------------------ > | Linearized > qtsmok | Coef. Std. Err. t P>|t| [95% Conf. Interval] > -------------+---------------------------------------------------------------- > male | -.1715913 .1273081 -1.35 0.178 -.4211643 .0779817 > age | -.0326805 .0053098 -6.15 0.000 -.0430898 -.0222713 > married | .0156776 .1427494 0.11 0.913 -.2641663 .2955215 > white | -.5607068 .1443603 -3.88 0.000 -.8437088 -.2777048 > mdrt_educ | -.0291425 .1441877 -0.20 0.840 -.3118061 .2535212 > high_educ | .5113156 .1800797 2.84 0.005 .1582899 .8643414 > incm_mdrt | -.0339146 .1557743 -0.22 0.828 -.3392925 .2714632 > incm_high | .1405313 .1766122 0.80 0.426 -.2056968 .4867595 > canada | 1.802811 .2552666 7.06 0.000 1.30239 2.303233 > rPSPPPi | -.0083975 .000842 -9.97 0.000 -.0100481 -.0067468 > wave2 | 2.111112 .1326945 15.91 0.000 1.850979 2.371244 > wave3 | 2.411039 .1389374 17.35 0.000 2.138668 2.68341 > ------------------------------------------------------------------------------ > ....................output ends here.............. > > The rPSPPPi is our price variable. We have more price variables but > logit results with them are similar to what reported here. > > Thank you very much! > > Lili > > On 10/18/07, Steichen, Thomas J. <SteichT@rjrt.com> wrote: > > Why do you consider this an indication of something wrong? > > > > Having zero completely determined successes e(N_cds) and failures > > e(N_cdf) is what you prefer. > > > > Is your overall # of records e(N) wrong? > > > > Show us some sample commands and output so we can see what you are doing. > > > > > > -----Original Message----- > > > > I checked the data just now. After running logit model with our > > dependent variable, the stored results show: > > > > e(N) = 5463 > > e(N_cds) = 0 > > e(N_cdf) = 0 > > > > So seems there is something wrong in the data setup. Could anyone > > please give me some help? > > > > > > ----------------------------------------- > > CONFIDENTIALITY NOTE: This e-mail message, including any > > attachment(s), contains information that may be confidential, > > protected by the attorney-client or other legal privileges, and/or > > proprietary non-public information. If you are not an intended > > recipient of this message or an authorized assistant to an intended > > recipient, please notify the sender by replying to this message and > > then delete it from your system. Use, dissemination, distribution, > > or reproduction of this message and/or any of its attachments (if > > any) by unintended recipients is not authorized and may be > > unlawful. > > > > * > > * For searches and help try: > > * http://www.stata.com/support/faqs/res/findit.html > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: help needed on discrete-time hazard model***From:*Lili Yan <lyan16@gmail.com>

**Re: st: help needed on discrete-time hazard model***From:*"Lili Yan" <lyan16@gmail.com>

**RE: st: help needed on discrete-time hazard model***From:*"Steichen, Thomas J." <SteichT@rjrt.com>

**Re: st: help needed on discrete-time hazard model***From:*"Lili Yan" <lyan16@gmail.com>

**RE: st: help needed on discrete-time hazard model***From:*"Steichen, Thomas J." <SteichT@rjrt.com>

- Prev by Date:
**st: RE: allocating more memory to stata 10** - Next by Date:
**st: missing t statistics** - Previous by thread:
**RE: st: help needed on discrete-time hazard model** - Next by thread:
**st: cross-sectional (retarded) averages** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |