Richard Williams <richardwilliams.ndu@gmail.com>

statalist@hsphsun2.harvard.edu, Statalist Statalist <statalist@hsphsun2.harvard.edu> |

Re: st: Including components of a summative score in regression

Mon, 30 Jul 2012 16:12:35 -0500

At 02:46 PM 7/30/2012, Donald Spady wrote:

Dear StatalistersI am doing some logistic regression analysis, some of the variablesof which are made up of the values of other variables; e.g. N = A + B + C/D.Is it reasonable, or appropriate, to include A, B, C, or D in theequation if N is already in it.i.e.logistic X F G H N A B C D, where F G H are some variables, and N ismade up of A B C D, but for some reason or other A B C D are desiredto be in the equation.My impression is that statistical theory would say this is a no-no,largely because of collinearity; however, if I do it, sometimes Iget a better 'fit' to the equation (using estat gof, group(10)).Thanks Donald Spady

I believe the improvements in fit stem from

* The linear effects of C and D are not captured by N So sure, adding those vars can improve fit. I could be wrong, but I believe if you ran logistic X F G H A B C D C/D

