Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Unexpected behaviour from -sample- with -if- and -by-


From   "Peter Jepsen" <pj@dce.au.dk>
To   "Statalist" <statalist@hsphsun2.harvard.edu>
Subject   st: Unexpected behaviour from -sample- with -if- and -by-
Date   Mon, 30 Oct 2006 20:25:11 +0100

Dear Stata-listers,

I need 10 random controls matched per case for an epidemiological study.
Controls are matched to cases on birth month and gender. I am using the
-sample- command, and my problem can be demonstrated with the following:

sysuse cancer, clear
bysort drug: gen case=_n==1
sample 10 if !case, count by(drug)		// Here, controls are
matched to cases on drug, not birth month and gender.
tab case drug

I expected this command to draw 10 random persons with case==0 from each
drug group and keep all three with case==1. The problem is that I sometimes
get a result like this:

            |      Drug type (1=placebo)
       case |         1          2          3 |     Total
-----------+---------------------------------+----------
          0 |         9          9          9 |        27
          1 |         1          1          1 |         3
-----------+---------------------------------+----------
      Total |        10         10         10 |        30



- and sometimes like this:


            |      Drug type (1=placebo)
       case |         1          2          3 |     Total
-----------+---------------------------------+----------
          0 |         9         10         10 |        29
          1 |         1          1          1 |         3
-----------+---------------------------------+----------
      Total |        10         11         11 |        32


But I expect the following, which I also get on occasion:

            |      Drug type (1=placebo)
       case |         1          2          3 |     Total
-----------+---------------------------------+----------
          0 |        10         10         10 |        30
          1 |         1          1          1 |         3
-----------+---------------------------------+----------
      Total |        11         11         11 |        33


-help sample- file has no examples with both -if- and -by-, and I suggest
that Stata's behaviour be described. I am now using a workaround where I
save the cases to a file, delete them, -sample 10, by(drug) count- and
-append- the cases back on. This is no big hassle, but it took me a long
time to discover that the -sample- command was responsible for the varying
number of controls per case.

Best regards,
Peter.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index