# Re: st: logistic ---- assessment of model fit via external validation

 From Richard Williams To statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu Subject Re: st: logistic ---- assessment of model fit via external validation Date Wed, 09 Jun 2004 13:33:37 -0500

```At 12:06 PM 6/9/2004 -0600, cthompson@dfpm.utah.edu wrote:
```
```Hello All ---
I'm using Intercooled Stata, 8.2.  In Hosmer & Lemeshow's
"Applied Logistic Regression", the authors indicate that it may
be possible to exclude a subsample of observations, develop a
model, then test the model on the excluded observations
(p.171).  I'm interested in doing just that, although I'm at a
loss as to if -- and how -- something like this can be
implemented in Stata.  I've searched the Statalist archives,
UCLA's statistics portal, and a few different textbooks for
hints on how this 'external validation' can be implemented, but
to no avail.  Note that I do not want to generate new
coefficients for the second sub-sample, rather, I want to use
the coefficients generated from the first sub-sample in
estimating a classification table for the second sub-sample.
```
I'm not sure if this is what they had in mind, but this might work. First, you need to install Nick Cox's -swor- routine.

. sysuse auto
(1978 Automobile Data)

. set seed 123

. swor 37, gen(mysample) keep

. quietly logit foreign price if mysample

. * fit using selected cases
. lstat

Logistic model for foreign

-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 0 1 | 1
- | 13 23 | 36
-----------+--------------------------+-----------
Total | 13 24 | 37

Classified + if predicted Pr(D) >= .5
True D defined as foreign != 0
--------------------------------------------------
Sensitivity Pr( +| D) 0.00%
Specificity Pr( -|~D) 95.83%
Positive predictive value Pr( D| +) 0.00%
Negative predictive value Pr(~D| -) 63.89%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 4.17%
False - rate for true D Pr( -| D) 100.00%
False + rate for classified + Pr(~D| +) 100.00%
False - rate for classified - Pr( D| -) 36.11%
--------------------------------------------------
Correctly classified 62.16%
--------------------------------------------------

. drop if mysample
(37 observations deleted)

. * fit using non-selected cases
. lstat, all

Logistic model for foreign

-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 1 3 | 4
- | 8 25 | 33
-----------+--------------------------+-----------
Total | 9 28 | 37

Classified + if predicted Pr(D) >= .5
True D defined as foreign != 0
--------------------------------------------------
Sensitivity Pr( +| D) 11.11%
Specificity Pr( -|~D) 89.29%
Positive predictive value Pr( D| +) 25.00%
Negative predictive value Pr(~D| -) 75.76%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 10.71%
False - rate for true D Pr( -| D) 88.89%
False + rate for classified + Pr(~D| +) 75.00%
False - rate for classified - Pr( D| -) 24.24%
--------------------------------------------------
Correctly classified 70.27%
--------------------------------------------------

.
I think this does what you have said you want, but whether it is the best way to proceed (or what H & L really had in mind) i don't know.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/