Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Problem with stset and sttocc


From   "Isabel Canette, StataCorp LP" <icanette@stata.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Problem with stset and sttocc
Date   Tue, 23 Oct 2007 09:25:19 -0500

If Andrew could send me (privately) a dataset and a short do-file to
replicate the problem, I would be happy to look into it.

    -- Isabel
    -- icanette(at)stata(dot)com



Andrew Hall wrote:
> Hello
> 
> I am having problems using stset and sttocc to create a case-control 
> data set from a simple cross-sectional data set.  Sttocc is selecting 
> cases as controls.   I am using Intercooled STATA 9.2 on Windows XP.  
> This is what is happening.
> 
> I have cross-sectional survey data on 7,572 Ethiopian schoolchildren of 
> whom 1,283 are orphans and 6,289 are non-orphans.  The variable orphan 
> is coded as 1 = orphan, 0 = non orphan.
> 
> The data were collected between 28/11/2006 and 8/2/2007.
> 
> I want to randomly select a non-orphan for each orphan matched on sex 
> and age at least to create a case-control data set.
> 
> I have stset the dataset using either date of visit (dov1) as the time 
> variable or created a new time variable fixed on one arbitrary date e.g. 
> 01/01/2007 (dovfixed)
> 
>                stset  dov1, failure(orphan=1)
>   or          stset dovfixed, failure(orphan=1)
> 
> This creates new temporary variables including _d which cross-tabs 
> perfectly with orphan (_d = 1 and orphan = 1 n=1283, _d =0 and orphan = 
> 0, n=6289).  It seems that the dataset have been properly stset (or has 
> it?).
> 
> I then use sttocc to match each case to one control on the variables sex 
> (1=male; 2=female) and ageyrs (in years) as follows:
> 
>             sttocc, match (sex ageyrs) number(1)
> 
> This works and cannot find controls for 2 cases only.
> 
> But when I do a cross-tab of orphan  by _case I find that 278 controls 
> who should be non-orphans have been selected from the cases (orphans, 
> failure=1).  All controls should be selected from the non-orphans.
> 
> Snapspan has no effect on the dataset as all id numbers in the data set 
> are unique anyway.
> 
> Why are orphans (failure) being selected as controls for orphans 
> (failure) when I have specified non-orphans? Am I just being dim?  Is 
> there something wrong the way I'm using stset?  I have tried setting 
> origin, enter and exit, but they are not really relevant as all subjects 
> were in effect studied on the same day, so it is not time  series data.  
> I am something of a novice and can't find any similar issues discussed 
> on the listserv archives, hence my request..
> 
> Thanks for reading this and suggestions would be gratefully received.
> 
> 
> Andrew Hall MSc PhD RPHNutr
> 
> Reader in Public Health Nutrition
> Centre for Public Health Nutrition
> Westminster University
> 115 New Cavendish Street
> London W1W 6UW
> 
> Tel: + 44 (0)207 911 5000 Ext 3910 
> 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index