Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: clustering in bootstrap


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: clustering in bootstrap
Date   Thu, 28 Apr 2011 23:10:14 -0500

Marcella Nicolini tries to bootstrap a panel data model, and runs into
a problem with repeated time values within the cluster in the
bootstrapped data.

You would want to give examples with commonly available manual
datasets, so that your problem can be reproduced easily by all other
interested subscribers.

For an illustration of what's going on, consider

use http://www.stata-press.com/data/r11/nlswork.dta, clear
keep idcode year ln_wage
keep if id <= 10 & year < 80
list
bsample, cluster(id)
list

-bootstrap- (inevitably) replicates some of the clusters, so the
observations that came from the same -id- would have repeated time
values, which is what -xtivreg- righfully complains about. To overcome
this, you would need to play with -idcluster()- option.

BTW, why do you want to bootstrap this? Why don't you like the
standard errors that -xtivreg2- gives you?

On Thu, Apr 28, 2011 at 4:17 PM, Marcella Nicolini
<archivio.marcella@gmail.com> wrote:
> Dear Stas, I am trying to estimate the following IV equation, and I
> would like to get a clustering in the s.e.
>
> I get this r(451) with different types of clusters:
>
> xi: bootstrap _b _se, reps(350) seed(10101) cluster(panel_id
> booking_day) nodots: xtivreg2 lnpriceuk2 book_day1-book_day63
> i.dow_book i.month_book resid_time  (sold= hol_book_per
> fare_priceuk_IV) if period>200312 & d_promo1==0  & avseats <50, fe
> i.dow_book        _Idow_book_0-6      (naturally coded; _Idow_book_0 omitted)
> i.month_book      _Imonth_boo_1-12    (naturally coded; _Imonth_boo_1 omitted)
> repeated time values within panel
> the most likely cause for this error is misspecifying the cluster(),
> idcluster(), or group() option
> r(451);
>
> xi: bootstrap _b _se, reps(350) seed(10101) cluster(panel_id) nodots:
> xtivreg2 lnpriceuk2 book_day1-book_day63  i.dow_book i.month_book
> resid_time  (sold= hol_book_per fare_priceuk_IV) if period>200312 &
> d_promo1==0  & avseats <50, fe
> i.dow_book        _Idow_book_0-6      (naturally coded; _Idow_book_0 omitted)
> i.month_book      _Imonth_boo_1-12    (naturally coded; _Imonth_boo_1 omitted)
> repeated time values within panel
> the most likely cause for this error is misspecifying the cluster(),
> idcluster(), or group() option
> r(451);
>
> But I do not understand how repeated time values may occur if the data
> do not show duplicates and the panel structure is as follows:
>
> . tsset panel_id booking_day
>       panel variable:  panel_id (strongly balanced)
>        time variable:  booking_day, 1 to 70, but with gaps
>                delta:  1 unit
>
> . duplicates list panel_id booking_day
>
> Duplicates in terms of panel_id booking_day
>
> (0 observations are duplicates)
>
> Ideally, I would like to cluster s.e. for a group of panel_ids, but
> this by definition
> implies repeated time values within cluster, so I am afraid that it is
> not possible
> (although I do not understand why)
> Thank you!
> Marcella
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index