Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Overall p-value for categorical variable in logistic regression |

Date |
Mon, 1 Jul 2013 12:20:30 +0200 |

On Mon, Jul 1, 2013 at 11:38 AM, Kassie Melius wrote: > I'd like to get an overall p-value for a categorical variable in my > multivariable model so as to display it in a table, rather than having > p-values for each level of the variable. Which of the below is > correct, if any at all? > > 1. Compare fit of the expanded categorical variable in the model using > LR test (here indepvar1, for example) > > logistic depvar i.indepvar1 i.indepvar2 i.indepvar3 , baselevels > est store A > logistic depvar i.indepvar2 i.indepvar3 , baselevels > est store B > lrtest A B > > where i get LR p=0.003 > > > 2.Input categorical variable not as indicator and take overall effect p-value > logistic depvar indepvar1 i.indepvar2 i.indepvar3 , baselevels > > where i get p=0.001 > > Which should I choose (or whatever you suggest), and could you explain > how these two p-vals differ? Your two methods test different null hypotheses: The null hypothesis for method 1 states that there is no difference in expected outcome between the levels of indepvar1 after adjusting for the other variable in your model. (and you can reject that null hypothesis at the usual significance levels) The null hypothesis for method 2 treats indepvar1 as a continuous variable and states that an unit change in indepvar1 (where what that unit means is likely but not necessarily problematic, given that you previously entered it as a categorical variable) is associated with 0 change in the expected outcome after adjusting for the other variables in your model. and you can reject that null hypothesis at the usual signficance levels. These different tests just represent different questions, so which one is right for you depends on which question you want to answer. Another difference between your proposals is that the first is a likelihood ratio test, while the second is a Wald test. Asymptotically, they are the same. However, when comparing tests it makes more sense to stick to one. Below is an example of how to perform likelihood ratio, Wald (and for completeness score) test of the two hypotheses you proposed: *------------------ begin example ------------------ sysuse nlsw88, clear gen byte occat = cond(occupation < 3 , 1, /// cond(inlist(occupation, 5, 6, 8, 13), 2, 3)) /// if occupation < . label variable occat "occupation in categories" label define occat 1 "high" /// 2 "middle" /// 3 "low" label value occat occat gen byte edcat = cond(grade < 12, 1, /// cond(grade == 12, 2, 3)) /// if grade < . label define edcat 1 "less than high school" /// 2 "high school" /// 3 "more than high school" label value edcat edcat label variable edcat "education in categories" //-------------------------------------- 0 effect *likelihood ratio test logit union i.edcat i.occat i.race i.south est store a logit union i.occat i.race i.south if edcat < . est store b lrtest a b * Wald test logit union i.edcat i.occat i.race i.south testparm i.edcat * score test logit union i.occat i.race i.south if edcat < . matrix b0 = e(b) logit union i.edcat i.occat i.race i.south, from(b0) iter(0) matrix chi2 = e(gradient)*e(V)*e(gradient)' di "Chi-square (2) = " %7.4g `=el(chi2,1,1)' di "p-value = " %7.4g `=chi2tail(2,chi2[1,1])' // -------------------------------- linear effect constraint 1 2*2.edcat = 3.edcat // constraining the third category to be twice the second category // is equivalent to adding the three category variable linearly: logit union i.edcat i.occat i.race i.south, constraint(1) logit union edcat i.occat i.race i.south * likelihood ratio test logit union i.edcat i.occat i.race i.south est store a logit union i.edcat i.occat i.race i.south, constraint(1) est store b lrtest a b * Wald test logit union i.edcat i.occat i.race i.south test 2*2.edcat = 3.edcat * score test logit union i.edcat i.occat i.race i.south, constraint(1) matrix b0 = e(b) logit union i.edcat i.occat i.race i.south, from(b0) iter(0) matrix chi2 = e(gradient)*e(V)*e(gradient)' di "Chi-square (1) = " %7.4g `=el(chi2,1,1)' di "p-value = " %7.4g `=chi2tail(1,chi2[1,1])' *------------------- end example ------------------- * (For more on examples I sent to the Statalist see: * http://www.maartenbuis.nl/example_faq ) Hope this helps, Maarten --------------------------------- Maarten L. Buis WZB Reichpietschufer 50 10785 Berlin Germany http://www.maartenbuis.nl --------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Overall p-value for categorical variable in logistic regression***From:*Kassie Melius <lamogia3@gmail.com>

- Prev by Date:
**st: Overall p-value for categorical variable in logistic regression** - Next by Date:
**Re: st: Population attributable fractions (PAFs) in discrete-time survival analysis. -punaf-** - Previous by thread:
**st: Overall p-value for categorical variable in logistic regression** - Next by thread:
**Re: st: Overall p-value for categorical variable in logistic regression** - Index(es):