Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Unexpected behaviour from -sample- with -if- and -by-


From   "Ben Jann" <ben.jann@soz.gess.ethz.ch>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Unexpected behaviour from -sample- with -if- and -by-
Date   Mon, 30 Oct 2006 22:18:00 +0100

-gsample- seems to do it right (see -ssc d gsample-):

. sysuse cancer, clear
(Patient Survival in Drug Trial)

. bysort drug: gen case=_n==1

. gsample 10 if !case, wor strata(drug) keep
(15 observations deleted)

. tab case drug

           |      Drug type (1=placebo)
      case |         1          2          3 |     Total
-----------+---------------------------------+----------
         0 |        10         10         10 |        30 
         1 |         1          1          1 |         3 
-----------+---------------------------------+----------
     Total |        11         11         11 |        33 

ben

Peter wrote;
> I need 10 random controls matched per case for an epidemiological
study.
> Controls are matched to cases on birth month and gender. I am using
the
> -sample- command, and my problem can be demonstrated with the
following:
> 
> sysuse cancer, clear
> bysort drug: gen case=_n==1
> sample 10 if !case, count by(drug)		// Here, controls are
> matched to cases on drug, not birth month and gender.
> tab case drug
> 
> I expected this command to draw 10 random persons with case==0 from
each
> drug group and keep all three with case==1. The problem is that I
> sometimes
> get a result like this:
> 
>             |      Drug type (1=placebo)
>        case |         1          2          3 |     Total
> -----------+---------------------------------+----------
>           0 |         9          9          9 |        27
>           1 |         1          1          1 |         3
> -----------+---------------------------------+----------
>       Total |        10         10         10 |        30
> 
> 
> 
> - and sometimes like this:
> 
> 
>             |      Drug type (1=placebo)
>        case |         1          2          3 |     Total
> -----------+---------------------------------+----------
>           0 |         9         10         10 |        29
>           1 |         1          1          1 |         3
> -----------+---------------------------------+----------
>       Total |        10         11         11 |        32
> 
> 
> But I expect the following, which I also get on occasion:
> 
>             |      Drug type (1=placebo)
>        case |         1          2          3 |     Total
> -----------+---------------------------------+----------
>           0 |        10         10         10 |        30
>           1 |         1          1          1 |         3
> -----------+---------------------------------+----------
>       Total |        11         11         11 |        33
> 
> 
> -help sample- file has no examples with both -if- and -by-, and I
suggest
> that Stata's behaviour be described. I am now using a workaround where
I
> save the cases to a file, delete them, -sample 10, by(drug) count- and
> -append- the cases back on. This is no big hassle, but it took me a
long
> time to discover that the -sample- command was responsible for the
varying
> number of controls per case.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index