[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Problem with stset and sttocc

From   "Andrew Hall" <>
To   <>
Subject   RE: st: Problem with stset and sttocc
Date   Thu, 25 Oct 2007 13:03:46 +0100

Dear Mike 

Thanks for your suggestions, particularly about the importance of setting the time t.  I created a new time variable and entered an abitrary date for all controls (non orphans) and a different, later date for all cases (orphans).  This worked and did not select cases as controls for cases (orphans as controls for oprhans).  This may be helpful to anyone else trying to select cases and controls from a cross-sectional data set in which all subjects were, in effect, sampled at the same time and had a characteristic which could be defined as "failure".

Andrew Hall

-----Original Message-----
From: on behalf of Mike Lacy
Sent: Wed 10/24/2007 4:41 PM
Subject: Re: st: Problem with stset and sttocc 
 >Date: Tue, 23 Oct 2007 11:45:13 +0100
 >From: Andrew Hall <>
 >Subject: st: Problem with stset and sttocc
 >I am having problems using stset and sttocc to create a case-control
 >data set from a simple cross-sectional data set.  Sttocc is selecting
 >cases as controls.   I am using Intercooled STATA 9.2 on Windows
 >XP.  This is what is happening.
 >I have cross-sectional survey data on 7,572 Ethiopian schoolchildren
 >of whom 1,283 are orphans and 6,289 are non-orphans.  The variable
 >orphan is coded as 1 = orphan, 0 = non orphan.
 >The data were collected between 28/11/2006 and 8/2/2007.
 >I want to randomly select a non-orphan for each orphan matched on sex
 >and age at least to create a case-control data set.
 >I have stset the dataset using either date of visit (dov1) as the
 >time variable or created a new time variable fixed on one arbitrary
 >date e.g. 01/01/2007 (dovfixed)
 >                stset  dov1, failure(orphan=1)
 >   or          stset dovfixed, failure(orphan=1)
 >This creates new temporary variables including _d which cross-tabs
 >perfectly with orphan (_d = 1 and orphan = 1 n=1283, _d =0 and orphan
 >= 0, n=6289).  It seems that the dataset have been properly stset 
(or has it?).
 >I then use sttocc to match each case to one control on the variables
 >sex (1=male; 2=female) and ageyrs (in years) as follows:
 >             sttocc, match (sex ageyrs) number(1)
 >This works and cannot find controls for 2 cases only.
 >But when I do a cross-tab of orphan  by _case I find that 278
 >controls who should be non-orphans have been selected from the cases
 >(orphans, failure=1).  All controls should be selected from the non-orphans.
 >Snapspan has no effect on the dataset as all id numbers in the data
 >set are unique anyway.
 >Why are orphans (failure) being selected as controls for orphans
 >(failure) when I have specified non-orphans?

I'm not certain about all the details of how you are using stocc, but 
I can think of one possible misunderstanding that might be confusing 
you:  Since -stocc- purports to do genuine incidence-density 
sampling, i.e. sampling controls from the risk set at time = t for a 
case that occurs at time = t,  -stocc- might well and correctly 
select a control that later becomes a case.  So, for example, suppose 
you have a child ABC that becomes orphaned at time3.  Suppose that, 
at time3, there are 200 subjects that have not yet become orphaned, 
i.e., are in the risk set at time3.  Suppose that one of them, child 
XYZ, is chosen from this risk set as one of the controls for child 
ABC, but let's further suppose that child XYZ, becomes a case at 
time5.  If you are thinking of controls as "children who never 
experience the event," this could produce confusing results, since 
you would be defining controls in a way contradictory to the idea of 
sampling from the risk set.

A more general question would whether a case-control study with this 
data at all, rather than an survival analysis. Perhaps there are some 
cost/effort savings (e.g., collecting additional explanatory 
variables)  here that were not relevant to mention (quite possible), 
but otherwise it sounds like you have the whole data set in hand, 
which would make me think "why sample?".


Mike Lacy
Fort Collins CO USA
(970) 491-6721 office

*   For searches and help try:

The University of Westminster is a charity and a company limited by
guarantee.  Registration number: 977818 England.  Registered Office:
309 Regent Street, London W1B 2UW, UK.


© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index