Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: problem with getting a 50% bootstrapped sample stratified by treatment group and clustered by patient zip code


From   "Martin Weiss" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: problem with getting a 50% bootstrapped sample stratified by treatment group and clustered by patient zip code
Date   Fri, 8 Jan 2010 00:25:55 +0100

<>

-count- merely counts, unsurprisingly, while you probably want the number of
unique values to comply with the rule stated in [R], p. 241:

" For stratified sampling of clusters, exp must be less than or equal to Nc
within the strata identified
by the strata() option."

You can resolve this conflict as below. I also replaced your variables -
which held a single value each - with a call to -summ, mean- which puts the
desired result in a returned value...


*******
clear*

input unique_id newgp str2 given_ahaid pat_zip trmtgp dis
    1              18001   "A"              1800           1  10
    2              18001 "B" 1800 1 15
    3              18001 "C" 1800 1 18
    4              18002 "A" 1800 1  5
    5              18002 "B" 1800 1  3
    6              18011 "A" 1801 1  0
    7              18011 "C" 1801 1  8
    8              18011 "D" 1801 1  9
    9              18011 "E" 1801 1  5
   10            18012 "B" 1801 1  7
   11            18012 "C" 1801 1 10
   12            18012 "D" 1801 1  4
   13            18012 "E" 1801 1  6
   14            18013 "D" 1801 1  9
   15            18013 "E" 1801 1  5
   16            17001 "A" 1700 0  8
   17            17001 "B" 1700 0  9
   18            17001 "C" 1700 0  7
  19             17002 "A" 1700 0 12
   20           17002 "B" 1700 0  8
   21            17011 "A" 1701 0  2
   22            17011 "C" 1701 0  1
   23           17011 "D" 1701 0  6
   24          17011 "E" 1701 0  4
   25           17012 "B" 1701 0  0
   26          17012 "C" 1701 0 17
   27          17012 "D" 1701 0  5
   28           17012 "E" 1701 0  4
   29           17013 "D" 1701 0  6
   30           17013 "E" 1701 0  7
end

compress

save bsampletest, replace



clear
use bsampletest, clear
qui tab trmtgp
scalar prnt=0.5    // to set 50 percent sampling, can change the number here
qui count if trmtgp==0     // comparision market in beginning year
scalar tabgp1=int(r(N)*prnt)
qui count if trmtgp==1     // treatment market in beginning year
scalar tabgp3=int(r(N)*prnt)
*display "tabgp1="tabgp1, "tabgp3=" tabgp3
//gen nsamp=cond(trmtgp==0 , tabgp1, 0) + cond(trmtgp==1, tabgp3, 0)

cap which unique
if _rc ssc inst unique

unique pat_zip, by(trmtgp) gen(cou)
su cou, mean

bsample `r(min)', strata (trmtgp) cluster(pat_zip)
su dis, mean
display "_N=" _N, "mean_dis= " r(mean)

*******


HTH
Martin


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Woolton Lee
Sent: Donnerstag, 7. Januar 2010 22:11
To: [email protected]
Subject: Re: st: RE: problem with getting a 50% bootstrapped sample
stratified by treatment group and clustered by patient zip code

Actually no, there is nothing wrong with bsample.ado.  The problem is
with the code I pasted below is that nsamp tells bsample to draw a
random sample of size larger than the number of clusters there are in
the data.  I am trying to figure out how to modify the code using the
count command so that nsamp reflects the number of clusters in the
data and not the number of observations

On Thu, Jan 7, 2010 at 2:31 PM, Martin Weiss <[email protected]> wrote:
>
> <>
>
> This looks like a forgotten linejoin indicator in line 452 of bsample.ado
to
> me...
>
> Currently reads:
>
> ***
>                if _rc {
>                        di as err
>                "resample size must not be greater than number of clusters"
>                        exit 498
>                }
> ***
>
> Should be:
>
> ***
>                if _rc {
>                        di as err ///
>                "resample size must not be greater than number of clusters"
>                        exit 498
>                }
> ***
>
>
> HTH
> Martin
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Woolton Lee
> Sent: Donnerstag, 7. Januar 2010 20:15
> To: statalist
> Subject: st: problem with getting a 50% bootstrapped sample stratified by
> treatment group and clustered by patient zip code
>
> Hello,
>
> I am have a dataset structured in the following manner,
>
> . input unique_id newgp str2 given_ahaid pat_zip trmtgp dis
>
>     unique_id      newgp   given_a~d    pat_zip     trmtgp        dis
>  1.     1              18001   "A"              1800           1
>       10
>  2.     2              18001 "B" 1800 1 15
>  3.     3              18001 "C" 1800 1 18
>  4.     4              18002 "A" 1800 1  5
>  5.     5              18002 "B" 1800 1  3
>  6.     6              18011 "A" 1801 1  0
>  7.     7              18011 "C" 1801 1  8
>  8.     8              18011 "D" 1801 1  9
>  9.     9              18011 "E" 1801 1  5
>  10.    10            18012 "B" 1801 1  7
>  11.    11            18012 "C" 1801 1 10
>  12.    12            18012 "D" 1801 1  4
>  13.    13            18012 "E" 1801 1  6
>  14.    14            18013 "D" 1801 1  9
>  15.    15            18013 "E" 1801 1  5
>  16.    16            17001 "A" 1700 0  8
>  17.    17            17001 "B" 1700 0  9
>  18.    18            17001 "C" 1700 0  7
>  19.   19             17002 "A" 1700 0 12
>  20.    20           17002 "B" 1700 0  8
>  21.    21            17011 "A" 1701 0  2
>  22.    22            17011 "C" 1701 0  1
>  23.    23           17011 "D" 1701 0  6
>  24.    24          17011 "E" 1701 0  4
>  25.    25           17012 "B" 1701 0  0
>  26.    26          17012 "C" 1701 0 17
>  27.    27          17012 "D" 1701 0  5
>  28.    28           17012 "E" 1701 0  4
>  29.    29           17013 "D" 1701 0  6
>  30.    30           17013 "E" 1701 0  7
>  31. end
>
> I would like to draw a 50% random sample for this dataset stratified
> by trtmtgp (treatment group) and clustered on patient zip code
> (pat_zip).  I have a simple code to do and have pasted the code below.
>  However when the code is run I get the error below.
>
> . forvalue i=1/5{
>  2.         clear
>  3.       use bsampletest, clear
>  4.       qui tab trmtgp
>  5.         scalar prnt=0.5                 // to set 50 percent
> sampling, can change
>> the number here
>  6.         qui count if trmtgp==0     // comparision market in beginning
> year
>  7.         scalar tabgp1=int(r(N)*prnt)
>  8.         qui count if trmtgp==1     // treatment market in beginning
> year
>  9.         scalar tabgp3=int(r(N)*prnt)
>  10.         *display "tabgp1="tabgp1, "tabgp3=" tabgp3
> .         gen nsamp=cond(trmtgp==0 , tabgp1, 0) + cond(trmtgp==1, tabgp3,
0)
>  11.
> .
> .         bsample nsamp, strata (trmtgp) cluster(pat_zip)
>  12.       egen mean_dist=mean(dis)
>  13.         display "_N=" _N, "mean_dis=" mean_dis
>  14. }
>
> unrecognized command:  "resample size must not be great invalid command
name
> r(199);
>
> Is there anyway to get bsample to draw a 50% random sample with
> replacement where it is stratified by treatment group and clustered on
> patient zip code?
>
> Thank you for your help,
>
> Wool
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index