FAQ: How do I obtain bootstrapped standard errors with panel?

Home / Resources & support / FAQs / Bootstrap with panel data

How do I obtain bootstrapped standard errors with panel data?

Title		Bootstrap with panel data
Author		Gustavo Sanchez, StataCorp

In general, the bootstrap is used in statistics as a resampling method to approximate standard errors, confidence intervals, and p-values for test statistics, based on the sample data. This method is significantly helpful when the theoretical distribution of the test statistic is unknown. In Stata, you can use the bootstrap command or the vce(bootstrap) option (available for many estimation commands) to bootstrap the standard errors of the parameter estimates. We recommend using the vce() option whenever possible because it already accounts for the specific characteristics of the data. This adjustment is particularly relevant for panel data where the randomly selected observations for the bootstrap cannot be chosen by individual record but must be chosen by panel.

In the vce() option we can include all the specifications we would regularly include in the bootstrap command. For example, if we need to perform a test on a linear combination of some of the coefficients of the regression model, we can directly incorporate the linear combination expression into vce(). The example below shows the bootstrap for the standard errors of the difference between the coefficients for age and wks_work on a fixed-effects regression for ln_wage:

. webuse nlswork
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtset idcode

Panel variable: idcode (unbalanced)

. xtreg ln_wage wks_work age tenure ttl_exp, fe vce(bootstrap 
     (_b[age] - _b[wks_work]),rep(10) seed(123))
(running xtreg on estimation sample)

Bootstrap replications (10): .........10 done

Bootstrap results                                       Number of obs = 27,408
                                                        Replications  =     10

      Command: xtreg ln_wage wks_work age tenure ttl_exp, fe
        _bs_1: _b[age] - _b[wks_work]

                              (Replications based on 4,674 clusters in idcode)



                 Observed   Bootstrap                         Normal-based
               coefficient  std. err.      z    P>|z|     [95% conf. interval]
   
       _bs_1    -.0056473   .0011328    -4.99   0.000    -.0078675    -.003427

As we mentioned above, we can get the same results with the bootstrap command. However, by using the vce() option, we do not have to explicitly specify the panel-data characteristics of our dataset.

With community-contributed commands or with non-estimation commands, we need to use bootstrap because there is no equivalent to the vce() option. The example below shows the bootstrap results for the ratio of the means of the first difference of two variables variables (ttl_exp and hours). We need to let the command know we are dealing with panel data and, therefore, each random selection must correspond to a panel. Moreover, repeated selections of the same panel within one bootstrapped sample should be internally treated as different panels.

Let’s first write a program that computes the ratio of the means of two variables:

program my_xtboot,rclass
           summarize d.`1',meanonly
           scalar mean`1'=r(mean)
           summarize d.`2',meanonly
           scalar mean`2'=r(mean)
           return scalar ratio=scalar(mean`1')/scalar(mean`2')
end

Next let’s create and set the identifier cluster variables for the bootstrapped panels, and then mark the sample to keep only those observations that do not contain missing values for the variables of interest.

. generate newid = idcode

. xtset newid year

Panel variable: newid (unbalanced)
 Time variable: year, 68 to 88, but with gaps
         Delta: 1 unit

. generate sample=1-missing(ttl_exp,hours)

. keep if sample
 (67 observations deleted)

Finally, we perform the simulation, specifying the panel characteristics of the dataset:

. bootstrap ratio=r(ratio),rep(10) seed(123) cluster(idcode) idcluster(newid) 
     nowarn:my_xtboot ttl_exp hours
(running my_xtboot on estimation sample)

Bootstrap replications (10): .........10 done

Bootstrap results                                       Number of obs = 28,467
                                                        Replications  =     10

      Command: my_xtboot ttl_exp hours
        ratio: r(ratio)

                              (Replications based on 4,710 clusters in idcode)



                 Observed   Bootstrap                         Normal-based
               coefficient  std. err.      z    P>|z|     [95% conf. interval]
   
       ratio     2.830833   1.542854     1.83   0.067    -.1931047    5.854771

There are two cluster options in the bootstrap command line. The first option, cluster(idcode), identifies the original panel variable in the dataset, whereas the second, idcluster(newid), creates a unique identifier for each of the selected clusters (panels in this case). Thus if some panels were selected more than once, the temporary variable newid would assign a different ID number to each resampled panel. If the two clusters indicators are omitted, bootstrap will not take into account the panel structure of the data; rather, it will construct the simulated samples by randomly selecting individual observations from the pooled data.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

How do I obtain bootstrapped standard errors with panel data?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


		Observed Bootstrap Normal-based
		coefficient std. err. z P>\|z\| [95% conf. interval]

_bs_1		-.0056473 .0011328 -4.99 0.000 -.0078675 -.003427


		Observed Bootstrap Normal-based
		coefficient std. err. z P>\|z\| [95% conf. interval]

ratio		2.830833 1.542854 1.83 0.067 -.1931047 5.854771

Stata/MP4 Annual License (download)

How do I obtain bootstrapped standard errors with panel data?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies