Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Including components of a summative score in regression

From   Richard Williams <[email protected]>
To   [email protected], Statalist Statalist <[email protected]>
Subject   Re: st: Including components of a summative score in regression
Date   Mon, 30 Jul 2012 16:12:35 -0500

At 02:46 PM 7/30/2012, Donald Spady wrote:
Dear Statalisters

I am doing some logistic regression analysis, some of the variables of which are made up of the values of other variables; e.g. N = A + B + C/D. Is it reasonable, or appropriate, to include A, B, C, or D in the equation if N is already in it.
logistic X F G H N A B C D, where F G H are some variables, and N is made up of A B C D, but for some reason or other A B C D are desired to be in the equation.

My impression is that statistical theory would say this is a no-no, largely because of collinearity; however, if I do it, sometimes I get a better 'fit' to the equation (using estat gof, group(10)).


Donald Spady

I believe the improvements in fit stem from

* N constrains the coefficients of A and B and C/D to be equal; adding A and B relaxes two of those constraints.

* The linear effects of C and D are not captured by N

So sure, adding those vars can improve fit.

I could be wrong, but I believe if you ran

logistic X F G H A B C D C/D

you would get the same fit as you are getting now. That might seem a little less convoluted to me. But I don't think what you are doing is inherently evil; you just have to understand what the parameters mean and why you get a better fit.

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index