Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Simplification of formula in logistic regression

 From Marcello Pagano To statalist@hsphsun2.harvard.edu Subject Re: st: Simplification of formula in logistic regression Date Mon, 16 May 2011 14:21:48 -0400

Before knocking this request too much further, one should consider the accuracy of the variables going into the equation. Something like blood pressure, which can be measured very accurately at any instant, can vary tremendously a minute later. One should not be fooled by apparent accuracy of clinical measures. The grandaddy (or grandmom??) of all these is the Apgar score. She wanted a measure of the babies at birth based on what we would consider very, very loose measures --- e.g. Reflex irritability (response of skin simulation to feet) : No response (score of 0); Some motion (score of 1); and Cry (score of 2); or Color: Blue:Pale (score of 0); Body Pink: extremities blue (score of 1); and completely pink (score of 2) --- and yet the use of this score has proven to be a great advance in pediatrics. An excellent read:
```"The Score" by Atul Gawande
http://www.newyorker.com/archive/2006/10/09/061009fa_fact

m.p.

On 5/16/2011 1:40 PM, Ariel Linden, DrPH wrote:
```
```As a health services researcher, I get frustrated by these requests. One the
one hand, we develop tools to maximize the accuracy of measurement, and on
the other hand, there is this constant desire to "dummy down" the
measurement instrument so that it can be "simple" for clinicians to use.

No matter that by dummying down the instrument, the accuracy likewise
diminishes.

I would suggest to Mikkel that you either remodel the data using "simple"
dichotomous terms, and accept that the accuracy of the model (e.g.
sensitivity/specificity) may be diminished, or more reasonably, you train
your clinicians how to use the instrument as it stands in its (presumably)
more accurate yet complex form.

Date: Sun, 15 May 2011 17:48:41 +0100
From: Nick Cox<njcoxstata@gmail.com>
Subject: Re: st: Simplification of formula in logistic regression

Sorry, but I think you will continue find this "correct way" to be elusive.

Nick

On Sun, May 15, 2011 at 4:23 PM, Mikkel Brabrand<mikkel@brabrand.net>
wrote:
```
```If I want clinicians to use my model, it needs to be simple. I cannot
```
```expect them to use a piece of software to calculate the risk score and it is
virtually impossible to have it incorporated in the programs used at my
department. I therefore need to simplify it and make the variables
categorized or dichotomous. I have previously used the trial and error way,
and come up with a model that seems reasonable (and tested it in an
independent cohort, and am now testing it in two external cohorts at other
hospitals). However, there must be a correct way to select the cuf-off
levels, I just cannot find out how. I have asked most statisticians I have
met on my way, but no one seems to know how. I hoped that some of you might
have a suggestion...
```
```Mikkel

Den 15/05/2011 kl. 16.49 skrev Nick Cox:

```
```I don't know what "statistically correct" would mean here. If you
think your model is useful, there are no grounds for coarsening it. If
the implication is that clinicians can't understand or don't need to
understand the internals of the formula you can think of encapsulating
the details in a Stata do-file or some equivalent in other software.

A broad issue is that detailed models optimised to fit particular
datasets often perform poorly on other data.

Nick

On Sun, May 15, 2011 at 3:43 PM, Mikkel Brabrand<mikkel@brabrand.net>
```
```wrote:
```
```I have performed a logistic regression analysis including five variables
```
```and one outcome. However, I would like to simplify the formula significantly
for clinical use. So, instead of the formula been something like
-12.22+2.33*systolic blood pressure-1.21*temperature etc., I would like to
make a scoring system where the score is calculated on basis of the measured
values of the vital signs.
```
```An example could be something like this

.................2 points..1 point...0 points...1 point.....2 points

Pulse   ...........-30........31-50....51-100....101-200..201-

Sys. BP.........-60........61-100..101-200...201-

However, I have no idea how to find the optimal cut-off points. Do any
```
```of you have a suggestion how to do this statistically correct?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```