Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Cluster Bootstrapping : repeated time values within panel error


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Cluster Bootstrapping : repeated time values within panel error
Date   Thu, 16 Dec 2010 23:06:43 -0500

If you clear the -tsset-ings, you sweep some important details under
the carpet. For one thing, you will not be able to utilize lags and
leads in your mysterious $expr. (Remember, each -global- is a coding
failure; you need them only in the cases of extreme difficulty in
passing parameters between routines.) However, there remains the
subtlety, and depending on what it is exactly that you are doing, you
may or may not be affected by it.

Suppose your data had just three clusters, as in:

input y x cl
3 2 1
5 4 1
6 3 2
7 4 2
2 1 3
5 0 3
end

. bysort cl: gen time = _n

. tsset cl time

. reg y x , cl(cl)

Linear regression                                      Number of obs =       6
                                                       F(  1,     2) =    3.50
                                                       Prob > F      =  0.2025
                                                       R-squared     =  0.3250
                                                       Root MSE      =  1.7103

                                     (Std. Err. adjusted for 3 clusters in cl)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |        .65   .3476849     1.87   0.202    -.8459672    2.145967
       _cons |       3.15   .5883189     5.35   0.033      .618668    5.681332
------------------------------------------------------------------------------

.

The original problem to which -tsset- complained arose because
-bootstrap- with -cluster()- option would create bootstrap samples in
which the -cluster()- variable would point to multiple copies of the
cluster:

. bsample , cl(cl)

. list, sepby(cl)

     +-------------------+
     | y   x   cl   time |
     |-------------------|
  1. | 3   2    1      1 |
  2. | 5   4    1      2 |
     |-------------------|
  3. | 2   1    3      1 |
  4. | 5   0    3      2 |
  5. | 2   1    3      1 |
  6. | 5   0    3      2 |
     +-------------------+

If your GMM estimation really relies on something that is going on
within the cluster, then indeed it is going to see two observations at
time 1 and two observations at time 2, appropriately breaking down
since it expected only one of each. To maintain identification of the
time points within the cluster, specify -idcluster()- option and give
it to your -gmm- command.

. bsample, cl(cl) idcl(newcl)

. list, sepby(newcl)

     +---------------------------+
     | y   x   cl   time   newcl |
     |---------------------------|
  1. | 3   2    1      1       1 |
  2. | 5   4    1      2       1 |
     |---------------------------|
  3. | 3   2    1      1       2 |
  4. | 5   4    1      2       2 |
     |---------------------------|
  5. | 3   2    1      1       3 |
  6. | 5   4    1      2       3 |
     +---------------------------+

The new variable -newcl- creates appropriate clusters as the groups of
observations that -bootstrap- pulled out independently from the
original data. The clusters given by -cl- variable are too large, and
the -bootstrap- may fail to obtain standard errors within the samples
if the number of resampled clusters happens to be less than the number
of parameters. Unfortunately, it won't issue an error message (or put
"x" instead of a dot, as I expected it to do) since the convention is
to provide zero standard errors in this case. They are interpreted as
missing when the output of a command is formatted, but not when
-bootstrap- posts the results.

If you type

. bootstrap , reps(50) cl(cl) noisily : reg y x, cl(cl)

you are going to see some occasions when only one cluster was
effectively resampled, and the standard errors could not be computed.
Still, -bootstrap- rolled on merrily. If I wanted to assess the
variability of the standard errors, as in

. bootstrap _b[_cons] _b[x] _se[_cons] _se[x] , reps(50) cl(cl) : reg
y x, cl(cl)

then I would really like -bootstrap- to mark down these occasions as
inappropriate, and report that it could not compute the standard
errors, rather than setting them to zero (and effectively reducing the
estimate of the mean standard error, with crazy effects on the
variance of the standard errors). This behavior also means that I am
estimating a model with varying degrees of freedom for the standard
errors... again something I am not really after.

I would expect to get what I wanted with

. bootstrap _b[_cons] _b[x] _se[_cons] _se[x] , reps(50) cl(cl)
idcl(newcl) : reg y x, cl(newcl)

but -bootstrap- outsmarted me, and did not let me run it that way.

On Thu, Dec 16, 2010 at 12:05 PM, Laura Rovegno
<laura.rovegno@uclouvain.be> wrote:
> I just been told how to solve it. But worth sharing in case someone else
> runs into this problem.
> You need to "clear" the definition of the panel
> So just do "tsset, clear" before the bootstrap and it work
>
> On 16/12/2010 17:54, Laura Rovegno wrote:
>>
>> Hello! I'm having a little problem with cluster bootstrapping in Stata.
>> I'm using Stata 11.
>>
>> Here are the command and error:
>>
>> . gmm ($expr), instruments(n_1 k) vce(bootstrap, reps(10) cluster(id))
>> twostep
>> (running gmm on estimation sample)
>>
>> Bootstrap replications (10)
>> ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
>> repeated time values within panel
>> the most likely cause for this error is misspecifying the cluster(),
>> idcluster(), or group() option
>>
>> I get the same if I do:
>>
>> . bootstrap, reps(10) cluster(id): gmm ($expr), instruments(n_1 k) twostep
>>
>> The panel in my dataset is well specified:
>> . tsset id year
>> panel variable: id (unbalanced)
>> time variable: year, 2002 to 2008, but with gaps
>> delta: 1 unit
>>
>> There are no observations with missing variables in the data. In fact if
>> I run gmm without bootstrap it uses 556 observations, the same number of
>> observations in the
>> dataset
>> . count
>> 556
>>
>> The problem is not with the gmm command since i get the same error with
>> other commands, for example:
>>
>> . bootstrap, reps(10) cluster(id) idcluster(id2) seed(123) nowarn: reg y n
>> (running regress on estimation sample)
>>
>> Bootstrap replications (10)
>> ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
>> repeated time values within panel
>> the most likely cause for this error is misspecifying the cluster(),
>> idcluster(), or group() option
>>
>> Bootstrapping without clustering works.
>> My panel is highly unbalanced with many observation that I observe only
>> once. However, if I only use the balanced panel I still get the error.
>> If I run other commands with cluster such as "xtreg y n, fe cluster(id)"
>> it works
>>
>> Any ideas what might be the problem and how to solve it?
>>
>> Thank you
>> Laura
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index