Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Cluster Bootstrapping : repeated time values within panel error |

Date |
Thu, 16 Dec 2010 23:06:43 -0500 |

If you clear the -tsset-ings, you sweep some important details under the carpet. For one thing, you will not be able to utilize lags and leads in your mysterious $expr. (Remember, each -global- is a coding failure; you need them only in the cases of extreme difficulty in passing parameters between routines.) However, there remains the subtlety, and depending on what it is exactly that you are doing, you may or may not be affected by it. Suppose your data had just three clusters, as in: input y x cl 3 2 1 5 4 1 6 3 2 7 4 2 2 1 3 5 0 3 end . bysort cl: gen time = _n . tsset cl time . reg y x , cl(cl) Linear regression Number of obs = 6 F( 1, 2) = 3.50 Prob > F = 0.2025 R-squared = 0.3250 Root MSE = 1.7103 (Std. Err. adjusted for 3 clusters in cl) ------------------------------------------------------------------------------ | Robust y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .65 .3476849 1.87 0.202 -.8459672 2.145967 _cons | 3.15 .5883189 5.35 0.033 .618668 5.681332 ------------------------------------------------------------------------------ . The original problem to which -tsset- complained arose because -bootstrap- with -cluster()- option would create bootstrap samples in which the -cluster()- variable would point to multiple copies of the cluster: . bsample , cl(cl) . list, sepby(cl) +-------------------+ | y x cl time | |-------------------| 1. | 3 2 1 1 | 2. | 5 4 1 2 | |-------------------| 3. | 2 1 3 1 | 4. | 5 0 3 2 | 5. | 2 1 3 1 | 6. | 5 0 3 2 | +-------------------+ If your GMM estimation really relies on something that is going on within the cluster, then indeed it is going to see two observations at time 1 and two observations at time 2, appropriately breaking down since it expected only one of each. To maintain identification of the time points within the cluster, specify -idcluster()- option and give it to your -gmm- command. . bsample, cl(cl) idcl(newcl) . list, sepby(newcl) +---------------------------+ | y x cl time newcl | |---------------------------| 1. | 3 2 1 1 1 | 2. | 5 4 1 2 1 | |---------------------------| 3. | 3 2 1 1 2 | 4. | 5 4 1 2 2 | |---------------------------| 5. | 3 2 1 1 3 | 6. | 5 4 1 2 3 | +---------------------------+ The new variable -newcl- creates appropriate clusters as the groups of observations that -bootstrap- pulled out independently from the original data. The clusters given by -cl- variable are too large, and the -bootstrap- may fail to obtain standard errors within the samples if the number of resampled clusters happens to be less than the number of parameters. Unfortunately, it won't issue an error message (or put "x" instead of a dot, as I expected it to do) since the convention is to provide zero standard errors in this case. They are interpreted as missing when the output of a command is formatted, but not when -bootstrap- posts the results. If you type . bootstrap , reps(50) cl(cl) noisily : reg y x, cl(cl) you are going to see some occasions when only one cluster was effectively resampled, and the standard errors could not be computed. Still, -bootstrap- rolled on merrily. If I wanted to assess the variability of the standard errors, as in . bootstrap _b[_cons] _b[x] _se[_cons] _se[x] , reps(50) cl(cl) : reg y x, cl(cl) then I would really like -bootstrap- to mark down these occasions as inappropriate, and report that it could not compute the standard errors, rather than setting them to zero (and effectively reducing the estimate of the mean standard error, with crazy effects on the variance of the standard errors). This behavior also means that I am estimating a model with varying degrees of freedom for the standard errors... again something I am not really after. I would expect to get what I wanted with . bootstrap _b[_cons] _b[x] _se[_cons] _se[x] , reps(50) cl(cl) idcl(newcl) : reg y x, cl(newcl) but -bootstrap- outsmarted me, and did not let me run it that way. On Thu, Dec 16, 2010 at 12:05 PM, Laura Rovegno <laura.rovegno@uclouvain.be> wrote: > I just been told how to solve it. But worth sharing in case someone else > runs into this problem. > You need to "clear" the definition of the panel > So just do "tsset, clear" before the bootstrap and it work > > On 16/12/2010 17:54, Laura Rovegno wrote: >> >> Hello! I'm having a little problem with cluster bootstrapping in Stata. >> I'm using Stata 11. >> >> Here are the command and error: >> >> . gmm ($expr), instruments(n_1 k) vce(bootstrap, reps(10) cluster(id)) >> twostep >> (running gmm on estimation sample) >> >> Bootstrap replications (10) >> ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 >> repeated time values within panel >> the most likely cause for this error is misspecifying the cluster(), >> idcluster(), or group() option >> >> I get the same if I do: >> >> . bootstrap, reps(10) cluster(id): gmm ($expr), instruments(n_1 k) twostep >> >> The panel in my dataset is well specified: >> . tsset id year >> panel variable: id (unbalanced) >> time variable: year, 2002 to 2008, but with gaps >> delta: 1 unit >> >> There are no observations with missing variables in the data. In fact if >> I run gmm without bootstrap it uses 556 observations, the same number of >> observations in the >> dataset >> . count >> 556 >> >> The problem is not with the gmm command since i get the same error with >> other commands, for example: >> >> . bootstrap, reps(10) cluster(id) idcluster(id2) seed(123) nowarn: reg y n >> (running regress on estimation sample) >> >> Bootstrap replications (10) >> ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 >> repeated time values within panel >> the most likely cause for this error is misspecifying the cluster(), >> idcluster(), or group() option >> >> Bootstrapping without clustering works. >> My panel is highly unbalanced with many observation that I observe only >> once. However, if I only use the balanced panel I still get the error. >> If I run other commands with cluster such as "xtreg y n, fe cluster(id)" >> it works >> >> Any ideas what might be the problem and how to solve it? >> >> Thank you >> Laura >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Cluster Bootstrapping : repeated time values within panel error***From:*Laura Rovegno <laura.rovegno@uclouvain.be>

**Re: st: Cluster Bootstrapping : repeated time values within panel error***From:*Laura Rovegno <laura.rovegno@uclouvain.be>

- Prev by Date:
**st: Formatting output into excel, svy: proportions questions.** - Next by Date:
**st: A way to capture the size of a file being used** - Previous by thread:
**Re: st: Cluster Bootstrapping : repeated time values within panel error** - Next by thread:
**st: impressive** - Index(es):