I am having problems using stset and sttocc to create a case-control data set from a simple cross-sectional data set. Sttocc is selecting cases as controls. I am using Intercooled STATA 9.2 on Windows XP. This is what is happening.

I have cross-sectional survey data on 7,572 Ethiopian schoolchildren of whom 1,283 are orphans and 6,289 are non-orphans. The variable orphan is coded as 1 = orphan, 0 = non orphan.

The data were collected between 28/11/2006 and 8/2/2007.

I want to randomly select a non-orphan for each orphan matched on sex and age at least to create a case-control data set.

I have stset the dataset using either date of visit (dov1) as the time variable or created a new time variable fixed on one arbitrary date e.g. 01/01/2007 (dovfixed)

stset dov1, failure(orphan=1)
or stset dovfixed, failure(orphan=1)

This creates new temporary variables including _d which cross-tabs perfectly with orphan (_d = 1 and orphan = 1 n=1283, _d =0 and orphan = 0, n=6289). It seems that the dataset have been properly stset (or has it?).

I then use sttocc to match each case to one control on the variables sex (1=male; 2=female) and ageyrs (in years) as follows:

sttocc, match (sex ageyrs) number(1)

This works and cannot find controls for 2 cases only.

But when I do a cross-tab of orphan by _case I find that 278 controls who should be non-orphans have been selected from the cases (orphans, failure=1). All controls should be selected from the non-orphans.

Snapspan has no effect on the dataset as all id numbers in the data set are unique anyway.

Why are orphans (failure) being selected as controls for orphans (failure) when I have specified non-orphans? Am I just being dim? Is there something wrong the way I'm using stset? I have tried setting origin, enter and exit, but they are not really relevant as all subjects were in effect studied on the same day, so it is not time series data. I am something of a novice and can't find any similar issues discussed on the listserv archives, hence my request..

