[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Andrew Hall <A.Hall04@westminster.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Problem with stset and sttocc |

Date |
Thu, 25 Oct 2007 13:57:46 +0100 |

Correction. I was mistaken. Please ignore this post. Sorry. Andrew At 13:03 25/10/2007, you wrote:

Dear Mike

Thanks for your suggestions, particularly about the importance of setting the time t. I created a new time variable and entered an abitrary date for all controls (non orphans) and a different, later date for all cases (orphans). This worked and did not select cases as controls for cases (orphans as controls for oprhans). This may be helpful to anyone else trying to select cases and controls from a cross-sectional data set in which all subjects were, in effect, sampled at the same time and had a characteristic which could be defined as "failure".

Andrew Hall

-----Original Message-----

From: owner-statalist@hsphsun2.harvard.edu on behalf of Mike Lacy

Sent: Wed 10/24/2007 4:41 PM

To: statalist@hsphsun2.harvard.edu

Subject: Re: st: Problem with stset and sttocc

>Date: Tue, 23 Oct 2007 11:45:13 +0100

>From: Andrew Hall <A.Hall04@westminster.ac.uk>

>Subject: st: Problem with stset and sttocc

>

>Hello

>

>I am having problems using stset and sttocc to create a case-control

>data set from a simple cross-sectional data set. Sttocc is selecting

>cases as controls. I am using Intercooled STATA 9.2 on Windows

>XP. This is what is happening.

>

>I have cross-sectional survey data on 7,572 Ethiopian schoolchildren

>of whom 1,283 are orphans and 6,289 are non-orphans. The variable

>orphan is coded as 1 = orphan, 0 = non orphan.

>

>The data were collected between 28/11/2006 and 8/2/2007.

>

>I want to randomly select a non-orphan for each orphan matched on sex

>and age at least to create a case-control data set.

>

>I have stset the dataset using either date of visit (dov1) as the

>time variable or created a new time variable fixed on one arbitrary

>date e.g. 01/01/2007 (dovfixed)

>

> stset dov1, failure(orphan=1)

> or stset dovfixed, failure(orphan=1)

>

>This creates new temporary variables including _d which cross-tabs

>perfectly with orphan (_d = 1 and orphan = 1 n=1283, _d =0 and orphan

>= 0, n=6289). It seems that the dataset have been properly stset

(or has it?).

>

>I then use sttocc to match each case to one control on the variables

>sex (1=male; 2=female) and ageyrs (in years) as follows:

>

> sttocc, match (sex ageyrs) number(1)

>

>This works and cannot find controls for 2 cases only.

>

>But when I do a cross-tab of orphan by _case I find that 278

>controls who should be non-orphans have been selected from the cases

>(orphans, failure=1). All controls should be selected from the non-orphans.

>

>Snapspan has no effect on the dataset as all id numbers in the data

>set are unique anyway.

>

>Why are orphans (failure) being selected as controls for orphans

>(failure) when I have specified non-orphans?

I'm not certain about all the details of how you are using stocc, but

I can think of one possible misunderstanding that might be confusing

you: Since -stocc- purports to do genuine incidence-density

sampling, i.e. sampling controls from the risk set at time = t for a

case that occurs at time = t, -stocc- might well and correctly

select a control that later becomes a case. So, for example, suppose

you have a child ABC that becomes orphaned at time3. Suppose that,

at time3, there are 200 subjects that have not yet become orphaned,

i.e., are in the risk set at time3. Suppose that one of them, child

XYZ, is chosen from this risk set as one of the controls for child

ABC, but let's further suppose that child XYZ, becomes a case at

time5. If you are thinking of controls as "children who never

experience the event," this could produce confusing results, since

you would be defining controls in a way contradictory to the idea of

sampling from the risk set.

A more general question would whether a case-control study with this

data at all, rather than an survival analysis. Perhaps there are some

cost/effort savings (e.g., collecting additional explanatory

variables) here that were not relevant to mention (quite possible),

but otherwise it sounds like you have the whole data set in hand,

which would make me think "why sample?".

Regards,

=-=-=-=-=-=-=-=-=-=-=-=-=

Mike Lacy

Fort Collins CO USA

(970) 491-6721 office

*

* For searches and help try:

* <http://www.stata.com/support/faqs/res/findit.html>http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* <http://www.ats.ucla.edu/stat/stata/>http://www.ats.ucla.edu/stat/stata/

-- The University of Westminster is a charity and a company limited by guarantee. Registration number: 977818 England. Registered Office: 309 Regent Street, London W1B 2UW, UK. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Problem with stset and sttocc***From:*Mike Lacy <Michael.Lacy@colostate.edu>

**RE: st: Problem with stset and sttocc***From:*"Andrew Hall" <A.Hall04@westminster.ac.uk>

- Prev by Date:
**Re: st: Problem with stset and sttocc** - Next by Date:
**Re: st: conditional logistic** - Previous by thread:
**RE: st: Problem with stset and sttocc** - Next by thread:
**st: Help with age adjustment** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |