Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Bootstrap command when used with cluster and strata options


From   "Jeph Herrin" <stata@spandrel.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Bootstrap command when used with cluster and strata options
Date   Thu, 24 Oct 2013 09:19:52 -0400

I have run into (and tripped over) this before, and also thought it was, if
not a bug, at least non-intuitive and non-useful. I thought my subjects were
getting new identifiers, but in fact they were getting duplicated
identifiers (which are hardly identifiers at all). 


http://www.stata.com/statalist/archive/2013-08/msg00175.html


Jeph

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Thursday, October 24, 2013 7:45 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Bootstrap command when used with cluster and strata options

Thanks for the clarification. StataCorp may naturally wish to comment.
Nick
njcoxstata@gmail.com


On 24 October 2013 11:14, Chris Frost <Chris.Frost@lshtm.ac.uk> wrote:
> Dear Nick
>
> Thanks for your input but I am afraid that I don't agree with you. 
> Stata is giving different subjects (from different strata) the same 
> "newid" in the bootstrap samples. There is nothing in the code that I 
> (or Austin) have used that implies that this should be the case. 
> Austin's program illustrates this nicely: if you run it you get
>
> . list
>
>  +--------------------+
>  | s   i   newid    c |
>  |--------------------|
>   1. | 0   5   1    1 |
>   2. | 0   6   2    2 |
>   3. | 0   7   3    3 |
>   4. | 0   7   4    4 |
>   5. | 0   9   5    5 |
>  |--------------------|
>   6. | 0   9   6    6 |
>   7. | 1   1   1    7 |
>   8. | 1   1   2    8 |
>   9. | 1   3   3    9 |
>  10. | 1   4  4   10 |
>  +--------------------+
>
> The first and seventh (and second and eighth etc.) subjects share the same
"newid", but they are not the same subject. The creation of "c" corrects
this, but should not be needed - this is, in my view, a software error. The
"fix" that I would like would make this correction automatic.
>
> Chris
>
> Chris Frost
> Professor of Medical Statistics
> Department of Medical Statistics
> London School of Hygiene and Tropical Medicine
> +44(0)20 7927 2242
>
>>>> Nick Cox <njcoxstata@gmail.com> 24/10/2013 10:56 >>>
> Hmmm... So, what would the "fix" be?  At first sight, you asked for 
> something you didn't want. It's difficult for Stata to know that.
>
> Nick
> njcoxstata@gmail.com
>
>
> On 24 October 2013 10:31, Chris Frost <Chris.Frost@lshtm.ac.uk> wrote:
>
> Thanks for the succinct illustration of the problem and neat "get 
> round" using egen.  I do think that this is a trap for the unwary 
> though and should really be fixed in the software (I can conceive of 
> no situation where newid needs to be crossed with strata in the 
> bootstrap samples - and plenty of situations where the introduction of 
> this artificial sharing of newid across strata will cause errors if 
> not corrected).
>
> Austin Nichols <austinnichols@gmail.com> 23/10/2013 19:10
>
>> No need, I can see what you mean in a simple example:
>>
>> clear
>> set seed 1
>> set obs 10
>> g s=_n<5
>> g i=_n
>> bsample, strata(s) cluster(i) idcluster(newid) egen c=group(s newid) 
>> list
>>
>> and I assume you need a newid that can act as a identifier across 
>> strata, so you need to generate a c as above.  You can wrap your 
>> commands to bootstrapped in a -program- and bootstrap it.
>
> On Wed, Oct 23, 2013 at 12:30 PM, Chris Frost <Chris.Frost@lshtm.ac.uk>
wrote:
>
> Thanks for your reply - but I do think the problem is with the 
> program, not with the data. In my data clusters (id) do not cross 
> strata (group) - the problem is that in each bootstrap sample that is 
> created the created cluster variable (newid) DOES (erroneously) cross 
> strata. This can be seen if the bootstrap is run with the "noisily"
> option. If you are interested in seeing the behavior I can send you an 
> annotated do file that illustrates the problem?
>
> Austin Nichols <austinnichols@gmail.com> 23/10/2013 16:42 >>>
>
>>> Sounds like a problem with your data to me, not the program. If your 
>>> clusters seem to cross strata, because of the coding in your data, 
>>> you can define a new cluster variable egen newc=group(group id) or 
>>> you can specify that clusters are defined by two variables 
>>> bootstrap, strata(group) cluster(group id) idcluster(newid):
>
> On Wed, Oct 23, 2013 at 6:11 AM, Chris Frost <Chris.Frost@lshtm.ac.uk>
wrote:
>
> I think that there is a problem with the bootstrap command when used 
> in conjunction with the "cluster" and "strata" options. The problem 
> arises because the command "bootstrap, strata(group) cluster(id)
> idcluster(newid) ....." creates a variable "newid" which is only 
> unique (at the cluster level) within each strata. For example if there 
> are 1000 subjects (with multiple measures per subject) each with a 
> unique id but in two equal size groups the above command will result 
> in each bootstrap sample having only 500 values of newid with subjects 
> being erroneously paired up: this will lead to incorrect variance 
> estimates with a command such as bootstrap, strata(group) cluster(id)
> idcluster(newid): mixed outcome i.group || newid:
>
> Am I correct? Can this be fixed?
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index