Thanks Maarten,
I think that I misused the term colinear. The other variables are simply
other mortality predictors. The thing is, though, while I have such
variables as blood pressure, heart rate, age, admitting diagnosis, etc.,
there will always be very important factors that I cannot measure--
mortality is a very blunt outcome (albeit not subject to the type of
measurement error that you describe in your aside), and all sorts of
individual circumstances which contribute to it do not lend themselves well
to coding and data collection (especially not for 10,000+ subjects). For
example, putting everything together, I cannot get above an area under ROC
curve 0.85. Or, putting it another way, the pseudo R-squared with all
variables included gets to about 0.23, at most.
I can take solice in the fact that my results are pretty much the same in
two different hospital populations, though.
Thanks for pointing out this issue, though (I think.)
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Maarten buis
Sent: Monday, January 30, 2006 2:32 PM
To: [email protected]
Subject: Re: st: Unobserved heterogeneity in logistic regression
Dear Daniel:
The problem with unobserved heterogeneity is that it is well... unobserved.
Apparently you have
many predictors of mortality available, so an obvious solution is to add
some of these predictors.
In an earlier post you suggested that your variables are collinear, so you
probably don't want to
add them all. That is no problem since the fact that they are collinear with
the variables left
out means that most of the variance is captured by the variables in the
model (it does make the
causal interpretation of these control variables more difficult, but the
roll of control variables
is to control, and that is what they do).
I see the results of my models more as a rough indication than anything
else. So I tend to worry
less about technicalities like these. In my own research I deal with survey
data, and in my
department they tape trained and experienced interviewers from reputable
agencies while they are
interviewing and code the interactions between interviewer and interviewed.
The results make me
very skeptical about the precision of my data. (See aside below) The paper
was written more to
satisfy my nerdish tendencies than that I thought that the impact of this
phenomenon would be
large enough to be noticeable above the random noise coming from data
collection. (I may be wrong
though; the simulations by Glenn Hoetker seem to point in that direction,
though I have not yet
read it as carefully as I should). I pointed you to this phenomenon because
in such a sensitivity
analysis this phenomenon might be worthy of a footnote, and my working paper
might be helpful in
understanding the literature to which it refers (and also the literature to
which Richard Williams
referred).
So, my not entirely satisfactory answer is: dealing with "observed
heterogeneity" is much easier
than unobserved heterogeneity. If you use additional modeling on top of that
and you get different
results make sure you understand why that is the case and convince yourself
that that is
plausible.
HTH,
Maarten
Aside
Taping interviews does result in some funny interactions though:
Interviewer: How many times do
you eat grain products for breakfast? Respondent: Well.... never .... eh....
well no, that's not
right, beer is a grain product too, isn't it?
More often the interactions aren't that funny. For instance, the
"experienced" interviewer looks
around the room and decides for the respondent in which income and
educational category he/she
falls, or asks very suggestive questions, makes mistakes while entering the
data, etc. etc. etc.
--- daniel waxman <[email protected]> wrote:
> Maartin Buis directed me to a short paper of his: "Unobserved
heterogeneity
> in logistic regression":
>
> http://home.fsw.vu.nl/m.buis/
>
> The concept makes sense--the question is what to do about it.
<snip>
> There are of course many unobserved causes for in-hospital mortality, but
> insofar as this particular model seems to work, do I need to deal with
this?
> If one does try to deal with it in a situation such as mine, is it a
matter
> of using a method other than simple logistic regression to fit the model,
or
> is it more a matter of assessment of goodness if fit?
>
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
___________________________________________________________
Yahoo! Photos - NEW, now offering a quality print service from just 8p a
photo http://uk.photos.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/