Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Significance of categorical variables in Logistic Regression


From   Marcello Pagano <pagano@hsph.harvard.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Significance of categorical variables in Logistic Regression
Date   Thu, 21 Mar 2013 10:29:12 -0400

For Michael:

I am running logistic regressions with a number of categorical iv's. I am building the model by starting out with variables that have a pval .15 when used in a single variable regression.

Then I put them all together, and weed out variables that are not significant in the multi-variable regression. When I get to a "core" model of significant predictors, I add back excluded variables one at a time to see if they are significant in the context of the smaller set of predictors.

If a categorical variable has at least one category that is significant, I keep the whole variable.

I have excluded a categorical variable, but noticed that if I base the variable on a different category than the default category, I suddenly see significant categories in the regression.

I.E.:

logistic yvar xvar1 xvar2 i.xvar3

results in every category of xvar having a high pval, but:

logistic yvar xvar1 xvar2 ib2.xvar3

results in several of xvar3's categories having a pval near 0.

From looking at marginplots I understand how this can happen, but I would like to know if there's a way of detecting this during the model building without looking at marginplots?

Many thanks,

Michael Cook

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index