Home  /  Stata News  /  Vol 38 No 4  /  In the spotlight: In a world of uncertainty, BMA is a good friend to have

## In the spotlight: In a world of uncertainty, BMA is a good friend to have

Let's use Bayesian model averaging (BMA) to investigate how mental health indicators influence time spent playing video games using an excerpt from this Kaggle dataset posted by Aditi Awasthi.

Usually, we choose one model and estimate parameters and other quantities of interest from the observed data conditional on this chosen model. For our example, we would like to fit a linear regression of hours spent playing video games on 42 predictors, including items from the Generalized Anxiety Disorder Inventory (GAD), the Satisfaction With Life Scale (SWL), the Social Phobia Inventory (SPIN), a narcissism measure, age, gender, employment status, and highest-obtained education degree.

. regress hoursplay gad1-age i.(gender employment degree)

Number of obs   =     8,980
 Source SS df MS F(41, 8938) = 4.56 Model 67655.8921 41 1650.14371 Prob > F = 0.0000 Residual 3235534.77 8,938 361.997625 R-squared = 0.0205 Adj R-squared = 0.0160 Total 3303190.66 8,979 367.87957 Root MSE = 19.026
 hoursplay Coefficient Std. err. t P>|t| [95% conf. interval] gad1 -.2484525 .3176057 -0.78 0.434 -.8710325 .3741276 gad2 -.6537547 .3615262 -1.81 0.071 -1.362429 .0549195 gad3 .0722776 .3137106 0.23 0.818 -.5426672 .6872224 gad4 .4940691 .2959787 1.67 0.095 -.0861171 1.074255 gad5 .0878802 .270851 0.32 0.746 -.4430499 .6188103 gad6 .7852404 .2516806 3.12 0.002 .2918887 1.278592 gad7 .2805871 .2930849 0.96 0.338 -.2939265 .8551006 swl1 .0066118 .1825303 0.04 0.971 -.3511895 .364413 swl2 -.4580919 .161555 -2.84 0.005 -.7747769 -.141407 swl3 .0443038 .181529 0.24 0.807 -.3115347 .4001423 swl4 -.0539451 .1581935 -0.34 0.733 -.3640406 .2561504 swl5 .0128688 .1295983 0.10 0.921 -.2411736 .2669112 spin1 -.2453695 .2566992 -0.96 0.339 -.7485588 .2578198 spin2 .0493649 .2074694 0.24 0.812 -.3573226 .4560525 spin3 .0831727 .2622274 0.32 0.751 -.4308532 .5971986 spin4 -.2718726 .2324322 -1.17 0.242 -.7274931 .1837478 spin5 -.2370667 .2308252 -1.03 0.304 -.689537 .2154037 spin6 -.1637211 .2398618 -0.68 0.495 -.6339052 .306463 spin7 .1243836 .1986672 0.63 0.531 -.2650498 .513817 spin8 .4553621 .1998924 2.28 0.023 .0635271 .8471971 spin9 -.1461319 .2036353 -0.72 0.473 -.5453039 .2530401 spin10 .0705562 .2569985 0.27 0.784 -.4332197 .5743321 spin11 .2240838 .1717869 1.30 0.192 -.112658 .5608256 spin12 .2116736 .2405859 0.88 0.379 -.25993 .6832772 spin13 .1469812 .2474796 0.59 0.553 -.3381357 .632098 spin14 .098584 .2486039 0.40 0.692 -.3887366 .5859046 spin15 -.2587051 .2133122 -1.21 0.225 -.676846 .1594357 spin16 .0126282 .2652423 0.05 0.962 -.5073075 .5325638 spin17 .0864269 .2182376 0.40 0.692 -.3413688 .5142226 narcissism .4333946 .1921391 2.26 0.024 .0567578 .8100314 age -.2757585 .0797467 -3.46 0.001 -.4320804 -.1194366 gender Male -1.780342 .9040933 -1.97 0.049 -3.552572 -.0081118 Other 3.058195 3.270204 0.94 0.350 -3.352155 9.468544 employment NA 3.149185 4.010776 0.79 0.432 -4.712856 11.01123 Student at.. -1.2569 .5909958 -2.13 0.033 -2.415387 -.0984127 Student at.. -.4458813 .8405001 -0.53 0.596 -2.093454 1.201692 Unemployed.. 4.132545 .8034745 5.14 0.000 2.557551 5.70754 degree High scho..) .9275215 .5741915 1.62 0.106 -.1980256 2.053069 Master (o..) .6352425 1.124561 0.56 0.572 -1.569156 2.839641 None 1.078527 .8941615 1.21 0.228 -.6742349 2.831289 Ph.D., Ps..) -.7402326 2.282285 -0.32 0.746 -5.214034 3.733569 _cons 35.53717 2.573734 13.81 0.000 30.49206 40.58228

Hmm. Do we need all 42 of these predictors? Probably not, but we’re not sure exactly which ones to include. In other words, which model should we select? In the case of linear regression, methods like stepwise regression, Wald tests, or even Bayes factors can be used to select one from among candidate models, but why should we have to choose?

With BMA, we account for model uncertainty in our analysis by averaging results over all plausible models based on the observed data. We do this by considering a pool of candidate models and estimating each model’s posterior probability according to Bayes’s Theorem. In the context of linear regression with p predictors, there are 2p candidate models. Then we estimate parameters by averaging parameter estimates across models with weights according to their posterior probabilities.

We can perform this analysis using Stata 18’s new bmaregress command. As with other estimation commands, we list our dependent variable followed by our list of independent variables.

. bmaregress hoursplay gad1-age i.(gender employment degree)

Burn-in ...
Simulation ...
Computing model probabilities ...

Bayesian model averaging                          No. of obs         =   8,980
Linear regression                                 No. of predictors  =      41
MC3 sampling                                                  Groups =      41
Always =       0
No. of models      =      52
For CPMP >= .9 =       6
Priors:                                           Mean model size    =   3.916
Models: Beta-binomial(1, 1)                     Burn-in            =   2,500
Cons.: Noninformative                          MCMC sample size   =  10,000
Coef.: Zellner's g                             Acceptance rate    =  0.0261
g: Benchmark, g = 8,980                    Shrinkage, g/(1+g) =  0.9999
sigma2: Noninformative                          Mean sigma2        = 362.484

Sampling correlation = 0.9980

 hoursplay Mean Std. dev. Group PIP employment Unemployed / between jobs 5.060935 .7228275 37 1 age -.2956931 .0635898 31 .99628 gad6 .8708031 .3563267 6 .90604 swl2 -.4278654 .2250447 9 .83169 spin8 .0416567 .1368465 20 .094544 employment Student at college / university -.0384546 .219673 35 .034229 narcissism .0076027 .0626469 30 .017336 Always _cons 35.17864 1.94818 0 1
Note: Coefficient posterior means and std. dev. estimated from 52 models. Note: Default priors are used for models and parameter g. Note: 34 predictors with PIP less than .01 not shown. Note: Acceptance rate is low.

The top of the output tells us that 52 models were explored, the top 6 of which account for 90+% of the cumulative posterior model probability. The mean model size was 3.9, meaning each model included about 4 predictors on average.

The estimation table includes estimates of posterior means and standard deviations of coefficients for each predictor. Across the models explored here, we infer that on average unemployed participants are expected to spend 5.06 more hours playing video games than other employment groups.

We also see estimated posterior inclusion probabilities (PIPs). The indicator for unemployment has a PIP of 1, meaning it was included in all explored models. The rest of the predictors are listed in order of their PIPs.

. bmastats models

Computing model probabilities ...

Model summary                              Number of models:
Visited = 52
Reported =  5

 Analytical PMP Frequency PMP Model size Rank 1 .6256 .5683 4 2 .1292 .1339 3 3 .0663 .0916 3 4 .04341 .0425 5 5 .02787 .0257 4
Note: Using analytical PMP for model ranking. Variable-inclusion summary
 Rank Rank Rank Rank Rank 1 2 3 4 5 gad6 x x x x swl2 x x x age x x x x x employment Unemployed / between jobs x x x x x spin8 x x
Legend: x - estimated

The highest-ranked model has a posterior model probability of 0.63 and includes four predictors. We can see which four predictors in the table below: gad6, swl2, age, and the indicator for unemployment. The next highest-ranked model has a posterior model probability of 0.13 and includes three predictors: gad6, age, and unemployment.

We can visualize variable inclusions across all models using the postestimation command bmagraph varmap.

. bmagraph varmap

Computing model probabilities ...

In addition to being pretty, this variable-inclusion map is also useful in providing an intuitive look at BMA. Each row is a predictor: blue signifying a positive regression coefficient, red signifying a negative coefficient, and gray signifying that it was not included in that model.

The first block extends to a cumulative posterior probability of 0.63, indicating our top-ranked model. Again, we can see that it includes four predictors. Unemployment and item 6 of the GAD inventory, which asks about frequency of “Becoming easily annoyed or irritable”, increase the expected time spent playing video games, while age and item 2 of the SWL scale, which rates agreement with “The conditions of my life are excellent”, decrease expected time. In other words, for young, unemployed people who are not satisfied with the conditions of their life and get easily annoyed, video games are a popular escape.

The next block represents the second-ranked model, and so on. In general, we can see that these four predictors are included in most models and that the eighth item of SPIN, which asks about agreement with “I avoid going to parties”, is also sometimes included with a negative coefficient. This makes sense. Parties probably aren’t very fun if you are annoyed with everyone there. We can explore this relationship further using bmastats jointness.

Computing model probabilities ...

 Jointness Doppelhofer–Weeks -1.064686 Ley–Steel type 1 .0810813 Ley–Steel type 2 .0882356 Yule's Q -.4871701
Notes: Using analytical PMPs. See thresholds.

These results suggest strong disjointedness, meaning that if we already have one of these predictors in our model, we really don’t need the other.

Even with this short demonstration, you can already see how much more information we get about our data than if we had just fit one model. And, of course, we can also use BMA to make better predictions than traditional methods by accounting for model uncertainty using bmapredict.

. bmapredict predicted_hours, mean
note: computing analytical posterior predictive means.

Here we performed a simple BMA analysis to demonstrate the method and a few of Stata's tools. For better predictive power, we could consider adding additional covariates, interaction terms, nonlinear terms, or even splines to the model space that BMA explores.

For further information about using BMA for prediction, model selection, and inference, including discussion on prior selection, variable grouping, and many more examples, take a tour of the Stata Bayesian Model Averaging Reference Manual.

— Meghan Cain
Assistant Director, Educational Services