Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Significance of categorical variables in Logistic Regression
From 
 
Marcello Pagano <[email protected]> 
To 
 
<[email protected]> 
Subject 
 
st: Significance of categorical variables in Logistic Regression 
Date 
 
Thu, 21 Mar 2013 10:29:12 -0400 
For Michael:
I am running logistic regressions with a number of categorical iv's. I 
am building the model by starting out with variables that have a pval 
.15 when used in a single variable regression.
Then I put them all together, and weed out variables that are not 
significant in the multi-variable regression. When I get to a "core" 
model of significant predictors, I add back excluded variables one at a 
time to see if they are significant in the context of the smaller set of 
predictors.
If a categorical variable has at least one category that is significant, 
I keep the whole variable.
I have excluded a categorical variable, but noticed that if I base the 
variable on a different category than the default category, I suddenly 
see significant categories in the regression.
I.E.:
logistic yvar xvar1 xvar2 i.xvar3
results in every category of xvar having a high pval, but:
logistic yvar xvar1 xvar2 ib2.xvar3
results in several of xvar3's categories having a pval near 0.
From looking at marginplots I understand how this can happen, but I 
would like to know if there's a way of detecting this during the model 
building without looking at marginplots?
Many thanks,
Michael Cook
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/