Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: : [sampling in Cox model] (more)

From   "Sayer, Bryan" <>
To   "'FEIVESON, ALAN H. (AL) (JSC-SK) (NASA) '" <>, "''' '" <>
Subject   st: RE: : [sampling in Cox model] (more)
Date   Thu, 19 Jun 2003 10:35:33 -0400

I'm sure there is a way to do informative sampling, but I think it depends
on what one is sampling for.  Typically, in a survey sample situation,
sampling is done based on information available BEFORE the sample is drawn,
i.e. census area data, administrative list, etc.  To use information
available only after the data is collected may bias the results.

So if the purpose is only to draw a sample from the dataset that will yield
estimates as close as possible to those obtained with a full dataset, then
one probably needs to pick observations around the mean of the independent
variables, possibly stratified by failure or failure time.  Of course
variance estimates will be off.

Another issue with a survival model that often seems to be lost is that
estimation is based on those still at risk as each failure occurs.  More
simply, mortality is ultimately 100%.

So rather than sampling, I wonder if it would be possible to simply
aggregate the data at each failure time?

Bryan Sayer
Statistician, SSS Inc.

-----Original Message-----
To: ''
Sent: 6/19/03 10:06 AM
Subject: st: : [sampling in Cox model]  (more)

To rephrase what I am asking, I think one ought to be able to do some
of stratified sampling. The strata would be based on the independent
variables. For example, if one stratum has "risky" values of the
variables, one would expect a lot of failures, ..etc. Bryan - are you
Can this be done in the context of fitting Cox models?


-----Original Message-----
Sent: Thursday, June 19, 2003 8:35 AM
To: ''
Subject: st: [sampling in Cox model] 

This raises an interesting question. Clearly, one could take an
that is, a purely blinded sample and run it. But is there a more
way? For example, if you just used the failures, you would bias your
estimates of the coefficients, but in some sense you would gain
So I'm wondering if there is a way of informative sampling (that is
purposely choosing a preponderance of failures) and somehow correcting
bias? If you did this, would the estimates be any more accurate than if
had just taken a noninformative sample to begin with?

Al Feiveson

-----Original Message-----
From: Nick Cox []
Sent: Wednesday, June 18, 2003 6:00 PM
Subject: st: RE: [Cox model] 

roger webb
> I need to run a Cox model on a very large cohort (of approximately
> 1.5 million subjects). Has anyone implemented a memory efficient
> routine that uses a sample from (as opposed to all) the individuals
> at risk?

Nothing to do with me, but I doubt that there 
is amy special procedure needed here. That is,
you just should take a sample upstream and then 
fit a Cox model on the sample data, I guess. 


*   For searches and help try:
*   For searches and help try:
*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index