Dear Stata-listers,
I need 10 random controls matched per case for an epidemiological study.
Controls are matched to cases on birth month and gender. I am using the
-sample- command, and my problem can be demonstrated with the following:
sysuse cancer, clear
bysort drug: gen case=_n==1
sample 10 if !case, count by(drug)		// Here, controls are
matched to cases on drug, not birth month and gender.
tab case drug
I expected this command to draw 10 random persons with case==0 from each
drug group and keep all three with case==1. The problem is that I sometimes
get a result like this:
            |      Drug type (1=placebo)
       case |         1          2          3 |     Total
-----------+---------------------------------+----------
          0 |         9          9          9 |        27
          1 |         1          1          1 |         3
-----------+---------------------------------+----------
      Total |        10         10         10 |        30
- and sometimes like this:
            |      Drug type (1=placebo)
       case |         1          2          3 |     Total
-----------+---------------------------------+----------
          0 |         9         10         10 |        29
          1 |         1          1          1 |         3
-----------+---------------------------------+----------
      Total |        10         11         11 |        32
But I expect the following, which I also get on occasion:
            |      Drug type (1=placebo)
       case |         1          2          3 |     Total
-----------+---------------------------------+----------
          0 |        10         10         10 |        30
          1 |         1          1          1 |         3
-----------+---------------------------------+----------
      Total |        11         11         11 |        33
-help sample- file has no examples with both -if- and -by-, and I suggest
that Stata's behaviour be described. I am now using a workaround where I
save the cases to a file, delete them, -sample 10, by(drug) count- and
-append- the cases back on. This is no big hassle, but it took me a long
time to discover that the -sample- command was responsible for the varying
number of controls per case.
Best regards,
Peter.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/