[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Cox overfitting test?

From   Maarten buis <>
Subject   Re: st: Cox overfitting test?
Date   Wed, 14 Nov 2007 10:58:12 +0000 (GMT)

--- Andrzej Niemierko <aniemierko@PARTNERS.ORG> wrote:
> When I run your program the histogram of mean deviance residuals is
> indeed centered on the vertical line (i.e., the mean is ~0.97). So in
> that sense the fitted Cox model behaves well on the bootstrap
> samples. Still, my concern is that I am capturing a lot of random
> fluctuations with only 52 cases and six parameters. 

I wrote this procedure to have a look at that exact concern. I
described the logic as: "If you are overfitting, than your estimates in
the real data would have little relevance for the bootstrap data and
the fit would be bad. If you haven't overfitted the data than the fit
would cluster around the fit in your real data." Which still sounds
good to me. However, when I tried it out in the example below I still
got a distribution of fit statistics around the observed value even
though in this example I am blatently overfitting the data. So, you'd
better forget my previous post... 


*---------------- begin example -------------------
set seed 123
sysuse cancer, clear
stset studytime, failure(died)

/* create random explanantory variables,  */
/* their square, and interaction terms    */
forvalues i = 1/5 {
	gen ran`i' = invnorm(uniform())
	gen ran`i'sq = ran`i'^2
	local j = `i'-1
	forvalues k = 1/`j' {
		gen ran`k'Xran`i' = ran`k'*ran`i'
stcox ran*

/*find the significant variables*/
indeplist, local
foreach var of varlist `X' {
	if abs(_b[`var']/_se[`var']) > 1.68 {
		local sig "`sig' `var'"

/* fit the model using the significant variables */
stcox `sig', mgal(mgal)

tempname memhold
tempfile results
postfile `memhold' msd using `results'
forvalues i = 1/1000 {
	predict dev, deviance
	gen dev2 = dev^2
	sum dev2, meanonly
      post `memhold' (`r(mean)')
postclose `memhold'
predict dev, deviance
gen dev2 = dev^2
sum dev2, meanonly
local observed = r(mean)

use `results', clear
hist msd, xline(`observed') /*
*/ xtitle("mean squared deviance residual")
*-------------- end example ------------------
(For more on how to use examples I sent to the Statalist, see )

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

Want ideas for reducing your carbon footprint? Visit Yahoo! For Good
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index