I'm sure there is a way to do informative sampling, but I think it depends on what one is sampling for. Typically, in a survey sample situation, sampling is done based on information available BEFORE the sample is drawn, i.e. census area data, administrative list, etc. To use information available only after the data is collected may bias the results. So if the purpose is only to draw a sample from the dataset that will yield estimates as close as possible to those obtained with a full dataset, then one probably needs to pick observations around the mean of the independent variables, possibly stratified by failure or failure time. Of course variance estimates will be off. Another issue with a survival model that often seems to be lost is that estimation is based on those still at risk as each failure occurs. More simply, mortality is ultimately 100%. So rather than sampling, I wonder if it would be possible to simply aggregate the data at each failure time? Bryan Sayer Statistician, SSS Inc. -----Original Message----- From: FEIVESON, ALAN H. (AL) (JSC-SK) (NASA) To: 'statalist@hsphsun2.harvard.edu' Sent: 6/19/03 10:06 AM Subject: st: : [sampling in Cox model] (more) To rephrase what I am asking, I think one ought to be able to do some sort of stratified sampling. The strata would be based on the independent variables. For example, if one stratum has "risky" values of the independent variables, one would expect a lot of failures, ..etc. Bryan - are you there? Can this be done in the context of fitting Cox models? Al -----Original Message----- From: FEIVESON, ALAN H. (AL) (JSC-SK) (NASA) Sent: Thursday, June 19, 2003 8:35 AM To: 'statalist@hsphsun2.harvard.edu' Subject: st: [sampling in Cox model] This raises an interesting question. Clearly, one could take an "upstream" that is, a purely blinded sample and run it. But is there a more efficient way? For example, if you just used the failures, you would bias your estimates of the coefficients, but in some sense you would gain precision. So I'm wondering if there is a way of informative sampling (that is purposely choosing a preponderance of failures) and somehow correcting for bias? If you did this, would the estimates be any more accurate than if you had just taken a noninformative sample to begin with? Al Feiveson -----Original Message----- From: Nick Cox [mailto:n.j.cox@durham.ac.uk] Sent: Wednesday, June 18, 2003 6:00 PM To: statalist@hsphsun2.harvard.edu Subject: st: RE: [Cox model] roger webb > I need to run a Cox model on a very large cohort (of approximately > 1.5 million subjects). Has anyone implemented a memory efficient > routine that uses a sample from (as opposed to all) the individuals > at risk? Nothing to do with me, but I doubt that there is amy special procedure needed here. That is, you just should take a sample upstream and then fit a Cox model on the sample data, I guess. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

