Home  /  Products  /  Features  /  Bayesian longitudinal/panel-data models

<-  See Stata's other features

Highlights

  • Outcomes: continuous, binary, ordinal, categorical, count

  • Random effects

  • Flexible prior distributions

  • Posterior distributions of panel or group effects

  • Bayesian predictions

  • Full Bayesian-features support

You can fit Bayesian longitudinal/panel-data models by simply prefixing your classical panel-data models with bayes:. Because panel-data models can be viewed as two-level hierarchical models, all the benefits of Bayesian multilevel modeling apply to panel-data models too.

You can fit a linear random-effects panel-data model to outcome y with predictors x1 and x2 and panel or group identifier id by typing

. xtset id
. xtreg y x1 x2

You can now fit a Bayesian counterpart of this model by typing

. xtset id
. bayes: xtreg y x1 x2

Of course, as with any other Stata command, you can use a point-and-click interface instead of typing the commands.

Bayesian panel-data models are not only for continuous outcomes. You can just as easily type for binary outcomes

. bayes: xtprobit y x1 x2

or for count outcomes

. bayes: xtpoisson y x1 x2

Or use any of the eight panel-data models that support the bayes prefix, including the panel-data multinomial logit model.

Let's see it work

Estimation

Consider a subset of data from the National Longitudinal Survey of Young Women between 14 and 24 years old in 1968, living in the South. We will model the log of wages as a function of individual's education, grade; their work experience, ttl_exp, which enters the model quadratically; and whether or not they live in a standard metropolian area, not_smsa. There are multiple observations per individual, identified by id.A

We fit a Bayesian panel-data linear model to account for individual effects. We also wish to compute Bayesian predictions of log wages and compare individuals' (panel) effects.

. webuse nlswork6
(Subsample of 1986 National Longitudinal Survey of Young Women)

To save time, we run a Markov chain Monte Carlo (MCMC) sample of 1,000, instead of the default 10,000. We also specify a random-number seed for reproducibility.

. bayes, mcmcsize(1000) rseed(17): xtreg ln_wage grade c.ttl_exp##c.ttl_exp 
     i.not_smsa
note: Gibbs sampling is used for regression coefficients and variance
      components.

Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 1000 .........1000 done

Model summary
Likelihood: ln_wage ~ normal(xb_ln_wage,{sigma2}) Priors: {ln_wage:grade} ~ normal(0,10000) (1) {ln_wage:ttl_exp} ~ normal(0,10000) (1) {ln_wage:c.ttl_exp#c.ttl_exp} ~ normal(0,10000) (1) {ln_wage:1.not_smsa} ~ normal(0,10000) (1) {ln_wage:_cons} ~ normal(0,10000) (1) {U[id]} ~ normal(0,{var_U}) (1) {sigma2} ~ igamma(0.01,0.01) Hyperprior: {var_U} ~ igamma(0.01,0.01)
(1) Parameters are elements of the linear form xb_ln_wage. Bayesian RE normal regression MCMC iterations = 3,500 Metropolis–Hastings and Gibbs sampling Burn-in = 2,500 MCMC sample size = 1,000 Group variable: id Number of groups = 831 Obs per group: min = 1 avg = 1.4 max = 5 Number of obs = 1,174 Acceptance rate = .796 Efficiency: min = .02411 avg = .07145 Log marginal-likelihood max = .1676
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
ln_wage
grade .0696248 .0052371 .00082 .069638 .0591421 .07999
ttl_exp .0404646 .0070371 .000544 .0404901 .026454 .053855
c.ttl_exp#
c.ttl_exp -.0005053 .0003978 .000039 -.0005132 -.0012609 .0003009
1.not_smsa -.1502314 .0251494 .002755 -.150213 -.200979 -.1014476
_cons .5646936 .0658804 .0099 .5680976 .4377888 .6957415
var_U .0801363 .0073138 .001258 .0795163 .0674855 .0959143
sigma2 .0673308 .0046724 .000952 .0671238 .0590742 .0771401

The posterior mean for the grade coefficient is positive, with a magnitude of 7 percent. Theory suggests that wages increase with experience but the increase tapers with time. This implies that the coefficient for ttl_exp should be positive, and the coefficient for c.ttl_exp#c.ttl_exp should be negative. We observe this in our data. Finally, living outside big urban centers has a negative effect on wages.

Bayesian modeling requires that you specify priors for all model parameters. As with any other bayes command, bayes: xtreg provides default priors for convenience. You should review the prior specifications and specify your own.

Convergence of MCMC

bayes: xtreg uses MCMC to obtain results. Its convergence needs to be verified before further analysis. You can do this graphically, or you can compute the Gelman–Rubin convergence statistic using multiple chains.

Let's assess MCMC convergence visually for, say, the grade coefficient.

. bayesgraph diagnostics {ln_wage:grade}

There is no apparent trend in the trace plot. Autocorrelation decreases with time. We do not have a reason to suspect nonconvergence for grade, but convergence needs to be verified for all model parameters.

Let's save our MCMC and estimation results for later use.

. bayes, saving(bxtregsim)
note: file bxtregsim.dta saved.

. estimates store bxtreg

Custom priors

With Bayesian models, you may want to incorporate your own priors. These priors often come from historical data. For instance, based on previous studies, it may be reasonable to assume that the coefficient for grade ranges between 0 and 25. So we may consider a uniform on (0,25) prior for it. We can use bayes's option prior() to specify custom priors.

. bayes, mcmcsize(1000) rseed(17) prior({ln_wage:grade}, uniform(0,25)):
    xtreg ln_wage grade c.ttl_exp##c.ttl_exp i.not_smsa
note: Gibbs sampling is used for variance components.


Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 1000 .........1000 done


Model summary
Likelihood: ln_wage ~ normal(xb_ln_wage,{sigma2}) Priors: {ln_wage:ttl_exp c.ttl_exp#c.ttl_exp 1.not_smsa _cons} ~ normal(0,10000) (1) {ln_wage:grade} ~ uniform(0,25) (1) {U[id]} ~ normal(0,{var_U}) (1) {sigma2} ~ igamma(0.01,0.01) Hyperprior: {var_U} ~ igamma(0.01,0.01)
(1) Parameters are elements of the linear form xb_ln_wage. Bayesian RE normal regression MCMC iterations = 3,500 Metropolis–Hastings and Gibbs sampling Burn-in = 2,500 MCMC sample size = 1,000 Group variable: id Number of groups = 831 Obs per group: min = 1 avg = 1.4 max = 5 Number of obs = 1,174 Acceptance rate = .6181 Efficiency: min = .009861 avg = .01259 Log marginal-likelihood max = .01751
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
ln_wage
grade .0691558 .0063225 .002013 .0692614 .0567288 .0819581
ttl_exp .0389882 .0080916 .002355 .0390646 .0228502 .0530886
c.ttl_exp#
c.ttl_exp -.0004548 .0004804 .000153 -.0004778 -.0013733 .0005973
1.not_smsa -.1452854 .0241483 .006649 -.1450175 -.1934294 -.0947368
_cons .5751218 .0727991 .022234 .5795395 .4176046 .7018132
var_U .0785815 .0082846 .00198 .0784331 .0615956 .0956798
sigma2 .0688682 .0059815 .001537 .0684258 .0583475 .0823034

Our custom prior did not change the results much.

Posterior distributions of panel or group effects

We may be interested in making inference about individuals' (panel) effects. Bayesian modeling provides a natural way of doing this. Unlike classical random-effects panel-data models, Bayesian panel-data models estimate random (panel) effects together with all other model parameters. As such, each random effect is represented by an entire MCMC sample from its posterior distribution. This sample can be used for inference such as comparison of panel or group effects. (Think of comparing performances of different companies or hospitals.)

Let's return to our original model, which used default uninformative priors.

. estimates restore bxtreg
(results bxtreg are active now)

Let's plot the posterior distributions for the first nine individual effects.

. bayesgraph histogram {U[1/9]}, byparm normal

The above individual effects represent shifts or offsets from the average log wage. There is definitely variation among individuals' salaries. For instance, we can see that the salary of individual 8 is higher than that of individual 5. So there are still differences between individual salaries that are not accounted for by the model predictors.

Bayesian predictions

Within the Bayesian framework, you can compute predictions and their uncertainties without making any asymptotic assumptions. This is possible because the obtained predictions are samples from the "exact" posterior predictive distribution of the new data given the observed data. The posterior predictive distribution is not assumed to be asymptotically normal.

Let's compute posterior means of predicted log wages together with their 95% credible intervals.

. bayespredict pmean, mean

Computing predictions ...

. bayespredict cri_l cri_u, cri

Computing predictions ...

Let's list the results for the first 10 observations.

. list ln_wage pmean cri_l cri_u in 1/10

ln_wage pmean cri_l cri_u
1. 1.543923 1.55197 .8831815 2.184215
2. 1.815738 1.645933 1.006488 2.284888
3. 1.870532 1.718343 1.112855 2.360129
4. 2.340405 2.16819 1.523068 2.747879
5. 1.545327 1.629953 1.092535 2.221188
6. 1.594335 1.687503 1.089151 2.268714
7. 1.848619 1.849651 1.266758 2.408907
8. 1.757637 1.80307 1.140366 2.467546
9. 2.346746 2.224904 1.529079 2.833333
10. 3.539868 2.826903 2.270285 3.511156

The predicted posterior means are close to the observed values except perhaps for observation 10. In the next section, we show how to check whether the model fits well more formally. But to do so, we must first save all MCMC predictions.

. bayespredict {_ysim1}, saving(bxtregpred)

Computing predictions ...

file bxtregpred.dta saved.
file bxtregpred.ster saved.

bxtregpred.dta contains an MCMC sample of values for each observation. The posterior means (pmean) we predicted earlier are, for each observation, the mean over the MCMC sample.

The MCMC predictions datasets are typically large, so you may consider saving them only when necessary; see [BAYES] bayespredict for details.

Posterior predictive checks—model fit

Another advantage of Bayesian analysis of panel-data models is formalized posterior predictive checks for checking model fit that are not available with classical frequentist analysis.

We can use posterior samples for the predicted outcome, the MCMC prediction sample, to perform model goodness-of-fit checks. For example, let's compare the minimum and maximum statistics from the MCMC prediction samples with those observed in the data. These statistics describe the tails of the data distribution. You can use any other statistics in place of (or in addition to) the maximum and minimum.

bayesstats ppvalues performs posterior predictive checks by computing the so-called posterior predictive p-values (PPPs). PPPs describe how often the statistics from the MCMC prediction sample were as extreme as or more extreme than those in the observed sample; see [BAYES] bayesstats ppvalues.

. bayesstats ppvalues (pmin:@min({_ysim1})) (pmax:@max({_ysim1})) using bxtregpred

Posterior predictive summary   MCMC sample size =     1,000

T Mean Std. dev. E(T_obs) P(T>=T_obs)
pmin .0299837 .1652143 .0176546 .565
pmax 3.515637 .2291669 3.85025 .085
Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.

The PPP for the minimum, .565, is not close to 0 or 1, so the model fits well with respect to this statistic. But the PPP for the maximum, .085, is close to 0 and thus suggests a poor fit for this statistic.

If we examine the data, we will discover a cluster of observations with values between 3.5 and 4 that look like outliers. We may need to revisit our model and address the issue of outliers.

Clean up

After the analysis, we can remove the files created by bayes: var and bayespredict that are no longer needed.

. erase bxtregsim.dta
. erase bxtregpred.dta
. erase bxtregpred.ster

Tell me more

Learn more in the Stata Bayesian Analysis Reference Manual.

Learn more about Bayesian econometrics features.

Learn more about new Bayesian multilevel modeling features.

See Bayesian regression models using the bayes prefix.

See Bayesian estimation.