Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Bootstrap command when used with cluster and strata options


From   "Chris Frost" <Chris.Frost@lshtm.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Bootstrap command when used with cluster and strata options
Date   Fri, 25 Oct 2013 07:51:47 +0100

Dear Jeff
 
Thanks for this. Just wanted to point out that the problem applies to the command "bootstrap" as well as "bsample". Hope that this command will be updated too?
 
Many thanks.
 
Chris  

>>> Jeff Pitblado, StataCorp LP <jpitblado@stata.com> 24/10/2013 19:54 >>>
Chris Frost <Chris.Frost@lshtm.ac.uk> is using -bootstrap- with options
-strata()-, -cluster()- and -idcluster()-, and noticed that the new cluster
variable repeates ID values (starting from 1) between the strata:

> I think that there is a problem with the bootstrap command when used in
> conjunction with the "cluster" and "strata" options. The problem arises
> because the command "bootstrap, strata(group) cluster(id) idcluster(newid)
> ....." creates a variable "newid" which is only unique (at the cluster
> level) within each strata. For example if there are 1000 subjects (with
> multiple measures per subject) each with a unique id but in two equal size
> groups the above command will result in each bootstrap sample having only
> 500 values of newid with subjects being erroneously paired up: this will
> lead to incorrect variance estimates with a command such as
>
> . bootstrap, strata(group) cluster(id) idcluster(newid):
>		 mixed outcome i.group || newid: 
> 
> Am I correct? Can this be fixed?

Austin Nichols <austinnichols@gmail.com> verified this, and pointed out that
-bsample- is the command that is producing the new cluster id variable.

Jeph Herrin <stata@spandrel.net> ran across this behavior in a reply to a
Statalist thread earlier this year.  Sorry I missed that thread Jeph.

The documentation for -idcluster()- for -bsample- says:

	idcluster(newvar) creates a new variable containing a unique
		identifier for each resampled cluster.

This description agrees with Chris's expectation.  As such, we will update
-bsample- to behave as expected.

--Jeff
jpitblado@stata.com 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search 
*   http://www.stata.com/support/faqs/resources/statalist-faq/ 
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index