Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Problem with stset and sttocc


From   Andrew Hall <A.Hall04@westminster.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   st: Problem with stset and sttocc
Date   Tue, 23 Oct 2007 11:45:13 +0100

Hello

I am having problems using stset and sttocc to create a case-control data set from a simple cross-sectional data set. Sttocc is selecting cases as controls. I am using Intercooled STATA 9.2 on Windows XP. This is what is happening.

I have cross-sectional survey data on 7,572 Ethiopian schoolchildren of whom 1,283 are orphans and 6,289 are non-orphans. The variable orphan is coded as 1 = orphan, 0 = non orphan.

The data were collected between 28/11/2006 and 8/2/2007.

I want to randomly select a non-orphan for each orphan matched on sex and age at least to create a case-control data set.

I have stset the dataset using either date of visit (dov1) as the time variable or created a new time variable fixed on one arbitrary date e.g. 01/01/2007 (dovfixed)

stset dov1, failure(orphan=1)
or stset dovfixed, failure(orphan=1)

This creates new temporary variables including _d which cross-tabs perfectly with orphan (_d = 1 and orphan = 1 n=1283, _d =0 and orphan = 0, n=6289). It seems that the dataset have been properly stset (or has it?).

I then use sttocc to match each case to one control on the variables sex (1=male; 2=female) and ageyrs (in years) as follows:

sttocc, match (sex ageyrs) number(1)

This works and cannot find controls for 2 cases only.

But when I do a cross-tab of orphan by _case I find that 278 controls who should be non-orphans have been selected from the cases (orphans, failure=1). All controls should be selected from the non-orphans.

Snapspan has no effect on the dataset as all id numbers in the data set are unique anyway.

Why are orphans (failure) being selected as controls for orphans (failure) when I have specified non-orphans? Am I just being dim? Is there something wrong the way I'm using stset? I have tried setting origin, enter and exit, but they are not really relevant as all subjects were in effect studied on the same day, so it is not time series data. I am something of a novice and can't find any similar issues discussed on the listserv archives, hence my request..

Thanks for reading this and suggestions would be gratefully received.


Andrew Hall MSc PhD RPHNutr

Reader in Public Health Nutrition
Centre for Public Health Nutrition
Westminster University
115 New Cavendish Street
London W1W 6UW

Tel: + 44 (0)207 911 5000 Ext 3910

--
The University of Westminster is a charity and a company limited by
guarantee. Registration number: 977818 England. Registered Office:
309 Regent Street, London W1B 2UW, UK.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index