[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Problem with stset and sttocc

From   Andrew Hall <>
Subject   Re: st: Problem with stset and sttocc
Date   Thu, 25 Oct 2007 13:56:37 +0100

Dear Isabel

Please ignore my most recent message: I misunderstood the output, and it does not work. Creating a new time variable does not do the trick. Sorry to mess you around.

I tried creating new time variables that were different for failures and non-failures so that:
failure time < non-failure time (this selected all controls from failures)
failure time > non-failure time (this selected 227 controls from failures)
failure time = non-failure time (this selected 221 controls from failures)

So I am no further forward. I hope very much that you can help

Regards, Andrew
At 15:25 23/10/2007, you wrote:

If Andrew could send me (privately) a dataset and a short do-file to
replicate the problem, I would be happy to look into it.

    -- Isabel
    -- icanette(at)stata(dot)com

Andrew Hall wrote:
> Hello
> I am having problems using stset and sttocc to create a case-control
> data set from a simple cross-sectional data set.  Sttocc is selecting
> cases as controls.   I am using Intercooled STATA 9.2 on Windows XP.
> This is what is happening.
> I have cross-sectional survey data on 7,572 Ethiopian schoolchildren of
> whom 1,283 are orphans and 6,289 are non-orphans.  The variable orphan
> is coded as 1 = orphan, 0 = non orphan.
> The data were collected between 28/11/2006 and 8/2/2007.
> I want to randomly select a non-orphan for each orphan matched on sex
> and age at least to create a case-control data set.
> I have stset the dataset using either date of visit (dov1) as the time
> variable or created a new time variable fixed on one arbitrary date e.g.
> 01/01/2007 (dovfixed)
>                stset  dov1, failure(orphan=1)
>   or          stset dovfixed, failure(orphan=1)
> This creates new temporary variables including _d which cross-tabs
> perfectly with orphan (_d = 1 and orphan = 1 n=1283, _d =0 and orphan =
> 0, n=6289).  It seems that the dataset have been properly stset (or has
> it?).
> I then use sttocc to match each case to one control on the variables sex
> (1=male; 2=female) and ageyrs (in years) as follows:
>             sttocc, match (sex ageyrs) number(1)
> This works and cannot find controls for 2 cases only.
> But when I do a cross-tab of orphan  by _case I find that 278 controls
> who should be non-orphans have been selected from the cases (orphans,
> failure=1).  All controls should be selected from the non-orphans.
> Snapspan has no effect on the dataset as all id numbers in the data set
> are unique anyway.
> Why are orphans (failure) being selected as controls for orphans
> (failure) when I have specified non-orphans? Am I just being dim?  Is
> there something wrong the way I'm using stset?  I have tried setting
> origin, enter and exit, but they are not really relevant as all subjects
> were in effect studied on the same day, so it is not time  series data.
> I am something of a novice and can't find any similar issues discussed
> on the listserv archives, hence my request..
> Thanks for reading this and suggestions would be gratefully received.
> Andrew Hall MSc PhD RPHNutr
> Reader in Public Health Nutrition
> Centre for Public Health Nutrition
> Westminster University
> 115 New Cavendish Street
> London W1W 6UW
> Tel: + 44 (0)207 911 5000 Ext 3910

*   For searches and help try:

The University of Westminster is a charity and a company limited by
guarantee.  Registration number: 977818 England.  Registered Office:
309 Regent Street, London W1B 2UW, UK.
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index