Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Simplification of formula in logistic regression
From
Marcello Pagano <[email protected]>
To
[email protected]
Subject
Re: st: Simplification of formula in logistic regression
Date
Mon, 16 May 2011 14:21:48 -0400
Before knocking this request too much further, one should consider the
accuracy of the variables going into the equation. Something like blood
pressure, which can be measured very accurately at any instant, can vary
tremendously a minute later. One should not be fooled by apparent
accuracy of clinical measures. The grandaddy (or grandmom??) of all
these is the Apgar score. She wanted a measure of the babies at birth
based on what we would consider very, very loose measures --- e.g.
Reflex irritability (response of skin simulation to feet) : No response
(score of 0); Some motion (score of 1); and Cry (score of 2); or Color:
Blue:Pale (score of 0); Body Pink: extremities blue (score of 1); and
completely pink (score of 2) --- and yet the use of this score has
proven to be a great advance in pediatrics. An excellent read:
"The Score" by Atul Gawande
http://www.newyorker.com/archive/2006/10/09/061009fa_fact
m.p.
On 5/16/2011 1:40 PM, Ariel Linden, DrPH wrote:
As a health services researcher, I get frustrated by these requests. One the
one hand, we develop tools to maximize the accuracy of measurement, and on
the other hand, there is this constant desire to "dummy down" the
measurement instrument so that it can be "simple" for clinicians to use.
No matter that by dummying down the instrument, the accuracy likewise
diminishes.
I would suggest to Mikkel that you either remodel the data using "simple"
dichotomous terms, and accept that the accuracy of the model (e.g.
sensitivity/specificity) may be diminished, or more reasonably, you train
your clinicians how to use the instrument as it stands in its (presumably)
more accurate yet complex form.
Date: Sun, 15 May 2011 17:48:41 +0100
From: Nick Cox<[email protected]>
Subject: Re: st: Simplification of formula in logistic regression
Sorry, but I think you will continue find this "correct way" to be elusive.
Nick
On Sun, May 15, 2011 at 4:23 PM, Mikkel Brabrand<[email protected]>
wrote:
If I want clinicians to use my model, it needs to be simple. I cannot
expect them to use a piece of software to calculate the risk score and it is
virtually impossible to have it incorporated in the programs used at my
department. I therefore need to simplify it and make the variables
categorized or dichotomous. I have previously used the trial and error way,
and come up with a model that seems reasonable (and tested it in an
independent cohort, and am now testing it in two external cohorts at other
hospitals). However, there must be a correct way to select the cuf-off
levels, I just cannot find out how. I have asked most statisticians I have
met on my way, but no one seems to know how. I hoped that some of you might
have a suggestion...
Mikkel
Den 15/05/2011 kl. 16.49 skrev Nick Cox:
I don't know what "statistically correct" would mean here. If you
think your model is useful, there are no grounds for coarsening it. If
the implication is that clinicians can't understand or don't need to
understand the internals of the formula you can think of encapsulating
the details in a Stata do-file or some equivalent in other software.
A broad issue is that detailed models optimised to fit particular
datasets often perform poorly on other data.
Nick
On Sun, May 15, 2011 at 3:43 PM, Mikkel Brabrand<[email protected]>
wrote:
I have performed a logistic regression analysis including five variables
and one outcome. However, I would like to simplify the formula significantly
for clinical use. So, instead of the formula been something like
-12.22+2.33*systolic blood pressure-1.21*temperature etc., I would like to
make a scoring system where the score is calculated on basis of the measured
values of the vital signs.
An example could be something like this
.................2 points..1 point...0 points...1 point.....2 points
Pulse ...........-30........31-50....51-100....101-200..201-
Sys. BP.........-60........61-100..101-200...201-
However, I have no idea how to find the optimal cut-off points. Do any
of you have a suggestion how to do this statistically correct?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/