Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Overall p-value for categorical variable in logistic regression

From	Maarten Buis <[email protected]>
To	[email protected]
Subject	Re: st: Overall p-value for categorical variable in logistic regression
Date	Mon, 1 Jul 2013 12:20:30 +0200

On Mon, Jul 1, 2013 at 11:38 AM, Kassie Melius wrote:
> I'd like to get an overall p-value for a categorical variable in my
> multivariable model so as to display it in a table, rather than having
> p-values for each level of the variable. Which of the below is
> correct, if any at all?
>
> 1. Compare fit of the expanded categorical variable in the model using
> LR test  (here indepvar1, for example)
>
> logistic depvar i.indepvar1 i.indepvar2 i.indepvar3 , baselevels
> est store A
> logistic depvar i.indepvar2 i.indepvar3 , baselevels
> est store B
> lrtest A B
>
> where i get LR p=0.003
>
>
> 2.Input categorical variable not as indicator and take overall effect p-value
> logistic depvar indepvar1 i.indepvar2 i.indepvar3 , baselevels
>
> where i get p=0.001
>
> Which should I choose (or whatever you suggest), and could you explain
> how these two p-vals differ?

Your two methods test different null hypotheses:

The null hypothesis for method 1 states that there is no difference in
expected outcome between the levels of indepvar1 after adjusting for
the other variable in your model. (and you can reject that null
hypothesis at the usual significance levels)

The null hypothesis for method 2 treats indepvar1 as a continuous
variable and states that an unit change in indepvar1 (where what that
unit means is likely but not necessarily problematic, given that you
previously entered it as a categorical variable) is associated with 0
change in the expected outcome after adjusting for the other variables
in your model. and you can reject that null hypothesis at the usual
signficance levels.

These different tests just represent different questions, so which one
is right for you depends on which question you want to answer.

Another difference between your proposals is that the first is a
likelihood ratio test, while the second is a Wald test.
Asymptotically, they are the same. However, when comparing tests it
makes more sense to stick to one. Below is an example of how to
perform likelihood ratio, Wald (and for completeness score) test of
the two hypotheses you proposed:

*------------------ begin example ------------------
sysuse nlsw88, clear

gen byte occat = cond(occupation < 3                 , 1,      ///
                 cond(inlist(occupation, 5, 6, 8, 13), 2, 3))  ///
                 if occupation < .
label variable occat "occupation in categories"
label define occat 1 "high"   ///
                   2 "middle" ///
                   3 "low"
label value occat occat

gen byte edcat = cond(grade <  12, 1,     ///
                 cond(grade == 12, 2, 3)) ///
                 if grade < .
label define edcat 1 "less than high school" ///
                   2 "high school"           ///
                   3 "more than high school"
label value edcat edcat
label variable edcat "education in categories"


//-------------------------------------- 0 effect
*likelihood ratio test
logit union i.edcat i.occat i.race i.south
est store a
logit union i.occat i.race i.south if edcat < .
est store b
lrtest a b

* Wald test
logit union i.edcat i.occat i.race i.south
testparm i.edcat

* score test
logit union i.occat i.race i.south if edcat < .
matrix b0 = e(b)
logit union i.edcat i.occat i.race i.south, from(b0) iter(0)
matrix chi2 = e(gradient)*e(V)*e(gradient)'
di "Chi-square (2) = " %7.4g `=el(chi2,1,1)'
di "p-value        = " %7.4g `=chi2tail(2,chi2[1,1])'

// -------------------------------- linear effect
constraint 1 2*2.edcat = 3.edcat

// constraining the third category to be twice the second category
// is equivalent to adding the three category variable linearly:
logit union i.edcat i.occat i.race i.south, constraint(1)
logit union edcat i.occat i.race i.south

* likelihood ratio test
logit union i.edcat i.occat i.race i.south
est store a
logit union i.edcat i.occat i.race i.south, constraint(1)
est store b
lrtest a b

* Wald test
logit union i.edcat i.occat i.race i.south
test 2*2.edcat = 3.edcat

* score test
logit union i.edcat i.occat i.race i.south, constraint(1)
matrix b0 = e(b)
logit union i.edcat i.occat i.race i.south, from(b0) iter(0)
matrix chi2 = e(gradient)*e(V)*e(gradient)'
di "Chi-square (1) = " %7.4g `=el(chi2,1,1)'
di "p-value        = " %7.4g `=chi2tail(1,chi2[1,1])'
*------------------- end example -------------------
* (For more on examples I sent to the Statalist see:
* http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Overall p-value for categorical variable in logistic regression
  - From: Kassie Melius <[email protected]>

Prev by Date: st: Overall p-value for categorical variable in logistic regression
Next by Date: Re: st: Population attributable fractions (PAFs) in discrete-time survival analysis. -punaf-
Previous by thread: st: Overall p-value for categorical variable in logistic regression
Next by thread: Re: st: Overall p-value for categorical variable in logistic regression
Index(es):
- Date
- Thread