Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Response to :Basic Q on Logistic Regression


From   User <[email protected]>
To   [email protected]
Subject   st: Response to :Basic Q on Logistic Regression
Date   Mon, 5 Jan 2004 01:09:10 -0500


Dear Statalisters:

In an earlier (December 22, 2003) exchange, Azadeh Khatibi asked the following question:

When using logit and logistic commands, can I have non-binary values for
my independent variable and controls? For example, for each
observation in my data set, I have the variable "age" and their ages (14,
15, 16...). If so, is there a limit on what the numbers can be or what form they can
take?

Can I just go ahead and use the data in the following way...

For condom use as the dependent variable, and controlling for several
variables (including interaction):

logit uses_condoms age gender age*gender

You will notice there's an interaction term in there. Can I have
interaction terms with non-binary variables (such as age here as I'd like
to use it).

In reading up on this, the only thing I've discovered is that you can have
only 0s and 1s for your dependent variable.


and Richard Williams replied:

Azadeh, in general, whatever is legitimate for IVs in an OLS regression is legitimate in a logistic regression. That includes interaction terms. In your example, you are basically allowing the effect of age to differ by gender, which should be fine. (Which is not to say that your model is correctly specified, but there is no inherent reason you can't do things like this; incidentally I would code gender 0-1 if it isn't coded that way already).

Indeed, most things you can do in OLS seem to have a fairly straightforward logistic counterpart, e.g. hypothesis tests, transformations of the IVs, etc.; the most obvious exception is that you can't do transformations on the DV, e.g. take the log of Y. Interpretation is more confusing in logistic regression, but it is doable.

********

I agree with Dr. Williams but will add one cautionary note to Azadeh re: linearity in the logit. If a continuous variable (like age) is not linear in the logit, it may be inappropriate to model it in a way that Azadeh is proposing and may be necessary to consider grouping and the use of dummy variables. More specifically:

As I understand it (and I am not an expert), one of the assumptions of logistic regression is that the logit is linear for continuous variables (like age). Hosmer and Lemeshow in their text "Applied Logistic Regression" provide a good discussion of this and indicate that if there is reason to believe that the odds ratio is not constant and changes depending on, for example, what decade or period of life is being considered, then linearity in the logit may be questionable. If this is true, it may be important to test and verify linearity in the logit for the continuous variables. They go on to suggest several methods of testing and investigating this including (1) design variables and (2) fractional polynomials. Luckily, Stata does do fractional polynomials (-help fracpoly-) and this is something you may want to check out. I might suspect that something like condom use could have some non-linearity or other breakpoint issues associated with age.

Again, Hosmer and Lemeshow have a very good discussion of this in their text. David G. Kleinbaum's "Logistic Regression: A Self learning Text" also explains logistic regression quite well but, interestingly, does not seem to discuss this issue of testing for linearity (at least in the first edition which I have).

David Miller
[email protected]



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index