[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
jpitblado@stata.com (Jeff Pitblado, Stata Corp.) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: bsample |

Date |
Wed, 25 Jun 2003 12:21:45 -0500 |

<uctpmtd@ucl.ac.uk> asks about bootstrapping -clogit- results, using the -cluster()- option of -bootstrap-: > I am using Stata 7. > > Given that -clogit- doesn't have the option of clustered standard errors, I > performed bootstrap to correct them. > > This is the code: > > #delimit; > set more 1; > set matsize 800; > set seed 1; > > bs "clogit choice private public time cost distpri distpub incpri incpub, > group(id)" "_b[time] _b[cost] _b[distpri] _b[distpub] _b[incpri] _b[incpub]", > cluster(area) reps(0) saving(bsclog) replace; > > I got my results, but it took Stata 2 hours to compute the se with 0 > replications, and 6 hours with 200 replications. -clogit- on the same data > (456399 obs when arranged in the long format) takes 1mn to run and I'm > running it on the University network. When I included the controls, it took > Stata 1 month to compute the se, for a model it takes 13mn to run with > clogit!!!!! > > I went a that point to try and check what was making it so slow, and decided to > draw the random sample manually, then do -clogit-. > > This is the code for 1 draw only: (I'm planning to do a loop for the number of > replications required once I solve the problem below) > > set seed 1 > set matsize 800 > set more 1 > > bsample, cluster(area) > clogit choice private public time cost distpri distpub incpri incpub, group(id) > > I got mixed results in the sense that the speed at which I obtained the > results was as expected, but -bsample- is mixing up the data as I should have > 1:2 matching (McFadden choice model), and I get 4:8. > > So summing-up, my problem with -bsample- is how to incorporate the id so I > could have the appropiate matching. Bootstrapping -clogit- (in the absence of clusters) ------------------------------------------------------------------------------ Let's begin by discussing how to bootstrap results from -clogit-; we'll talk about -clogit- with clustered groups later. The -clogit- command requires grouped data. Thus, when bootstrapping the results from -clogit-, you need to sample the groups (each group as a whole) instead of the observations. That is, each group is itself a cluster of information, thus use the -cluster()- option of -bootstrap- to sample the groups. It is usually the case that we need to specify the "cluster" variable in the estimation command. For -clogit-, we identify this variable in the -group()- option. Remember, this variable identifies the groups we are sampling with replacement, thus each group that is sampled more than once must have a unique identifier. That is, if the group with "id==1" is sampled twice, the repeat group must have a different identifying value than the original. This is accomplished using the -idcluster()- option. Here we bootstrap the results from the first example in [R] clogit. ***** BEGIN: c1.do version 7 clear use http://www.stata-press.com/data/r7/clogitid gen myid = id bs "clogit y x1 x2, group(myid)" "_b[x1] _b[x2]", cluster(id) idclust(myid) /* */ dot ***** END: c1.do Notice that -bs- will produce cluster samples using the -id- variable, but will call -clogit- using the -myid- variable to identify the groups. -myid- contains unique values for each sampled group. Clustered -clogit- ------------------------------------------------------------------------------ "uctpmtd" has a slightly more complicated situation. There are clusters of groups, so we need to sample the clusters with replacement, but still uniquely identify the sampled groups. The -bs- command cannot handle this without a little help from the user. If -bs- were to supply me with the -group()- and -idcluster()- variables, I could generate a new group variable that uniquely identified the sampled groups (across the clusters), then run the -clogit- command with the new group variable. The following details how I accomplished this. Using the data from the above example, I artificially create a cluster variable -clust-, each containing at most 5 groups. ***** BEGIN: c2a.do version 7 clear use http://www.stata-press.com/data/r7/clogitid set seed 1234 * generate a cluster variable sort id by id: gen clust = _n==1 replace clust = 1+mod(sum(clust),5) ***** END: c2a.do In order to ensure that -clogit- gets uniquely identified groups, while sampling the clusters with replacement, I wrote a short program and placed it in an ado-file: myclogit.ado (listed below). -myclogit- is a wrap-around to -clogit-. Its purpose is to generate a new group variable from the original group variable and the -idcluster()- variable. The variables and options are passed through to -clogit-. ***** BEGIN: myclogit.ado program define myclogit version 7 syntax varlist , group(varname) idcluster(varname) [ * ] /* preserve original order within -group()- */ tempvar newgroup order gen `order' = _n /* generate a new group id variable */ sort `idcluster' `group' `order' by `idcluster' `group' : gen `newgroup' = _n==1 replace `newgroup' = sum(`newgroup') clogit `varlist' , group(`newgroup') `options' end ***** END: myclogit.ado With -myclogit- I can now use -bs- to bootstrap the standard errors of the coefficients, while accounting for clustering of groups. ***** BEGIN gen myclust = clust bs "myclogit y x1 x2, group(id) idcluster(myclust)" "_b[x1] _b[x2]", /* */ cluster(clust) idclust(myclust) dot ***** END P.S. Remember that when a group has multiple choices, -clogit- must account for all possible choice combinations. In "uctpmtd"'s first attempt to bootstrap results from -clogit-, each group that was sampled multiple times was causing -clogit- to go through that much more work. "uctpmtd" should not experience this if -myclogit- is used as described above. --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Strange -robust- results with a singleton dummy***From:*"Mark Schaffer" <M.E.Schaffer@hw.ac.uk>

- Prev by Date:
**st: Bootstrap with strata option** - Next by Date:
**Re: st: Writing macro contents to a file** - Previous by thread:
**Re: st: bsample** - Next by thread:
**st: Strange -robust- results with a singleton dummy** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |