Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: [sampling in Cox model]

From   Lee Sieswerda <>
To   "''" <>
Subject   st: RE: [sampling in Cox model]
Date   Thu, 19 Jun 2003 11:23:51 -0400

This answer is perhaps going in a different direction; that is, a design
direction rather than a statistical direction. But there is the nested case
control study, which is commonly used within large cohorts to test specific

The selection of controls can be tricky, but because much is known about the
participants owing to their participation in the cohort, bias and
confounding can often be well controlled. The main problem in case-control
studies is usually recall bias, but that is largely eliminated in the nested
study. Bias related to time is handled by selecting controls from the same
time period as the cases.

The purpose of the nested case-control study is not usually improved
precision, but  rather convenience. At the edges of my recall, though, I
seem to remember that there are situations where the nested case-control
design can be more efficient than the cohort analysis. However, I'll stop
right there because I'm not in any way an expert on this kind of study.
There are some on this list that are, I suspect.

There was a three-part series in the American Journal of Epidemiology a
while back on nested case-control studies. 

Wacholder S, McLaughline Jk, Silverman DT, Mandel JS. Selection of controls
in case-control studies. I. Principles. Am J Epidemiol 1992; 135:1019-1028.
Ibid. II. Types of Controls. Am J Epidemiol 1992; 135: 1029-1041.
Ibid. III. Design Options. Am J Epidemiol 1992; 135: 1042-1050.

However, in this particular case, there doesn't seem to be much need for a
nested case-control study. As Ronan pointed out, Stata can probably handle
the full data set and RAM is cheap.


Lee Sieswerda, Epidemiologist
Thunder Bay District Health Unit

-----Original Message-----
Sent: Thursday, June 19, 2003 9:35 AM
To: ''
Subject: st: [sampling in Cox model] 

This raises an interesting question. Clearly, one could take an "upstream"
that is, a purely blinded sample and run it. But is there a more efficient
way? For example, if you just used the failures, you would bias your
estimates of the coefficients, but in some sense you would gain precision.
So I'm wondering if there is a way of informative sampling (that is
purposely choosing a preponderance of failures) and somehow correcting for
bias? If you did this, would the estimates be any more accurate than if you
had just taken a noninformative sample to begin with?

Al Feiveson

-----Original Message-----
From: Nick Cox []
Sent: Wednesday, June 18, 2003 6:00 PM
Subject: st: RE: [Cox model] 

roger webb
> I need to run a Cox model on a very large cohort (of approximately 1.5 
> million subjects). Has anyone implemented a memory efficient routine 
> that uses a sample from (as opposed to all) the individuals at risk?

Nothing to do with me, but I doubt that there 
is amy special procedure needed here. That is,
you just should take a sample upstream and then 
fit a Cox model on the sample data, I guess. 


*   For searches and help try:
*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index