Home  /  Stata News  /  Vol 38 No 4  /  In the spotlight: In a world of uncertainty, BMA is a good friend to have
The Stata News

«Back to main page

In the spotlight: In a world of uncertainty, BMA is a good friend to have

Let's use Bayesian model averaging (BMA) to investigate how mental health indicators influence time spent playing video games using an excerpt from this Kaggle dataset posted by Aditi Awasthi.

Usually, we choose one model and estimate parameters and other quantities of interest from the observed data conditional on this chosen model. For our example, we would like to fit a linear regression of hours spent playing video games on 42 predictors, including items from the Generalized Anxiety Disorder Inventory (GAD), the Satisfaction With Life Scale (SWL), the Social Phobia Inventory (SPIN), a narcissism measure, age, gender, employment status, and highest-obtained education degree.

. regress hoursplay gad1-age i.(gender employment degree)

   Number of obs   =     8,980
Source SS df MS
F(41, 8938) = 4.56
Model 67655.8921 41 1650.14371 Prob > F = 0.0000
Residual 3235534.77 8,938 361.997625 R-squared = 0.0205
Adj R-squared = 0.0160
Total 3303190.66 8,979 367.87957 Root MSE = 19.026
hoursplay Coefficient Std. err. t P>|t| [95% conf. interval]
gad1 -.2484525 .3176057 -0.78 0.434 -.8710325 .3741276
gad2 -.6537547 .3615262 -1.81 0.071 -1.362429 .0549195
gad3 .0722776 .3137106 0.23 0.818 -.5426672 .6872224
gad4 .4940691 .2959787 1.67 0.095 -.0861171 1.074255
gad5 .0878802 .270851 0.32 0.746 -.4430499 .6188103
gad6 .7852404 .2516806 3.12 0.002 .2918887 1.278592
gad7 .2805871 .2930849 0.96 0.338 -.2939265 .8551006
swl1 .0066118 .1825303 0.04 0.971 -.3511895 .364413
swl2 -.4580919 .161555 -2.84 0.005 -.7747769 -.141407
swl3 .0443038 .181529 0.24 0.807 -.3115347 .4001423
swl4 -.0539451 .1581935 -0.34 0.733 -.3640406 .2561504
swl5 .0128688 .1295983 0.10 0.921 -.2411736 .2669112
spin1 -.2453695 .2566992 -0.96 0.339 -.7485588 .2578198
spin2 .0493649 .2074694 0.24 0.812 -.3573226 .4560525
spin3 .0831727 .2622274 0.32 0.751 -.4308532 .5971986
spin4 -.2718726 .2324322 -1.17 0.242 -.7274931 .1837478
spin5 -.2370667 .2308252 -1.03 0.304 -.689537 .2154037
spin6 -.1637211 .2398618 -0.68 0.495 -.6339052 .306463
spin7 .1243836 .1986672 0.63 0.531 -.2650498 .513817
spin8 .4553621 .1998924 2.28 0.023 .0635271 .8471971
spin9 -.1461319 .2036353 -0.72 0.473 -.5453039 .2530401
spin10 .0705562 .2569985 0.27 0.784 -.4332197 .5743321
spin11 .2240838 .1717869 1.30 0.192 -.112658 .5608256
spin12 .2116736 .2405859 0.88 0.379 -.25993 .6832772
spin13 .1469812 .2474796 0.59 0.553 -.3381357 .632098
spin14 .098584 .2486039 0.40 0.692 -.3887366 .5859046
spin15 -.2587051 .2133122 -1.21 0.225 -.676846 .1594357
spin16 .0126282 .2652423 0.05 0.962 -.5073075 .5325638
spin17 .0864269 .2182376 0.40 0.692 -.3413688 .5142226
narcissism .4333946 .1921391 2.26 0.024 .0567578 .8100314
age -.2757585 .0797467 -3.46 0.001 -.4320804 -.1194366
gender
Male -1.780342 .9040933 -1.97 0.049 -3.552572 -.0081118
Other 3.058195 3.270204 0.94 0.350 -3.352155 9.468544
employment
NA 3.149185 4.010776 0.79 0.432 -4.712856 11.01123
Student at.. -1.2569 .5909958 -2.13 0.033 -2.415387 -.0984127
Student at.. -.4458813 .8405001 -0.53 0.596 -2.093454 1.201692
Unemployed.. 4.132545 .8034745 5.14 0.000 2.557551 5.70754
degree
High scho..) .9275215 .5741915 1.62 0.106 -.1980256 2.053069
Master (o..) .6352425 1.124561 0.56 0.572 -1.569156 2.839641
None 1.078527 .8941615 1.21 0.228 -.6742349 2.831289
Ph.D., Ps..) -.7402326 2.282285 -0.32 0.746 -5.214034 3.733569
_cons 35.53717 2.573734 13.81 0.000 30.49206 40.58228

Hmm. Do we need all 42 of these predictors? Probably not, but we’re not sure exactly which ones to include. In other words, which model should we select? In the case of linear regression, methods like stepwise regression, Wald tests, or even Bayes factors can be used to select one from among candidate models, but why should we have to choose?

With BMA, we account for model uncertainty in our analysis by averaging results over all plausible models based on the observed data. We do this by considering a pool of candidate models and estimating each model’s posterior probability according to Bayes’s Theorem. In the context of linear regression with p predictors, there are 2p candidate models. Then we estimate parameters by averaging parameter estimates across models with weights according to their posterior probabilities.

We can perform this analysis using Stata 18’s new bmaregress command. As with other estimation commands, we list our dependent variable followed by our list of independent variables.

. bmaregress hoursplay gad1-age i.(gender employment degree)

Burn-in ...
Simulation ...
Computing model probabilities ...

Bayesian model averaging                          No. of obs         =   8,980
Linear regression                                 No. of predictors  =      41
MC3 sampling                                                  Groups =      41
                                                              Always =       0
                                                  No. of models      =      52
                                                      For CPMP >= .9 =       6
Priors:                                           Mean model size    =   3.916
  Models: Beta-binomial(1, 1)                     Burn-in            =   2,500
   Cons.: Noninformative                          MCMC sample size   =  10,000
   Coef.: Zellner's g                             Acceptance rate    =  0.0261
       g: Benchmark, g = 8,980                    Shrinkage, g/(1+g) =  0.9999
  sigma2: Noninformative                          Mean sigma2        = 362.484

Sampling correlation = 0.9980

hoursplay Mean Std. dev. Group PIP
employment
Unemployed / between jobs 5.060935 .7228275 37 1
age -.2956931 .0635898 31 .99628
gad6 .8708031 .3563267 6 .90604
swl2 -.4278654 .2250447 9 .83169
spin8 .0416567 .1368465 20 .094544
employment
Student at college / university -.0384546 .219673 35 .034229
narcissism .0076027 .0626469 30 .017336
Always
_cons 35.17864 1.94818 0 1
Note: Coefficient posterior means and std. dev. estimated from 52 models. Note: Default priors are used for models and parameter g. Note: 34 predictors with PIP less than .01 not shown. Note: Acceptance rate is low.

The top of the output tells us that 52 models were explored, the top 6 of which account for 90+% of the cumulative posterior model probability. The mean model size was 3.9, meaning each model included about 4 predictors on average.

The estimation table includes estimates of posterior means and standard deviations of coefficients for each predictor. Across the models explored here, we infer that on average unemployed participants are expected to spend 5.06 more hours playing video games than other employment groups.

We also see estimated posterior inclusion probabilities (PIPs). The indicator for unemployment has a PIP of 1, meaning it was included in all explored models. The rest of the predictors are listed in order of their PIPs.

We can get more information about our top five models using postestimation command bmastats models.

. bmastats models

Computing model probabilities ...

Model summary                              Number of models:
                                                    Visited = 52
                                                   Reported =  5

Analytical PMP Frequency PMP Model size
Rank
1 .6256 .5683 4
2 .1292 .1339 3
3 .0663 .0916 3
4 .04341 .0425 5
5 .02787 .0257 4
Note: Using analytical PMP for model ranking. Variable-inclusion summary
Rank Rank Rank Rank Rank
1 2 3 4 5
gad6 x x x x
swl2 x x x
age x x x x x
employment
Unemployed / between jobs x x x x x
spin8 x x
Legend: x - estimated

The highest-ranked model has a posterior model probability of 0.63 and includes four predictors. We can see which four predictors in the table below: gad6, swl2, age, and the indicator for unemployment. The next highest-ranked model has a posterior model probability of 0.13 and includes three predictors: gad6, age, and unemployment.

We can visualize variable inclusions across all models using the postestimation command bmagraph varmap.

. bmagraph varmap

Computing model probabilities ...

bma1.svg

In addition to being pretty, this variable-inclusion map is also useful in providing an intuitive look at BMA. Each row is a predictor: blue signifying a positive regression coefficient, red signifying a negative coefficient, and gray signifying that it was not included in that model.

The first block extends to a cumulative posterior probability of 0.63, indicating our top-ranked model. Again, we can see that it includes four predictors. Unemployment and item 6 of the GAD inventory, which asks about frequency of “Becoming easily annoyed or irritable”, increase the expected time spent playing video games, while age and item 2 of the SWL scale, which rates agreement with “The conditions of my life are excellent”, decrease expected time. In other words, for young, unemployed people who are not satisfied with the conditions of their life and get easily annoyed, video games are a popular escape.

The next block represents the second-ranked model, and so on. In general, we can see that these four predictors are included in most models and that the eighth item of SPIN, which asks about agreement with “I avoid going to parties”, is also sometimes included with a negative coefficient. This makes sense. Parties probably aren’t very fun if you are annoyed with everyone there. We can explore this relationship further using bmastats jointness.

. bmastats jointness gad6 spin8

Computing model probabilities ...

Variables: gad6 spin8

Jointness
Doppelhofer–Weeks -1.064686
Ley–Steel type 1 .0810813
Ley–Steel type 2 .0882356
Yule's Q -.4871701
Notes: Using analytical PMPs. See thresholds.

These results suggest strong disjointedness, meaning that if we already have one of these predictors in our model, we really don’t need the other.

Even with this short demonstration, you can already see how much more information we get about our data than if we had just fit one model. And, of course, we can also use BMA to make better predictions than traditional methods by accounting for model uncertainty using bmapredict.

. bmapredict predicted_hours, mean
note: computing analytical posterior predictive means.

Here we performed a simple BMA analysis to demonstrate the method and a few of Stata's tools. For better predictive power, we could consider adding additional covariates, interaction terms, nonlinear terms, or even splines to the model space that BMA explores.

For further information about using BMA for prediction, model selection, and inference, including discussion on prior selection, variable grouping, and many more examples, take a tour of the Stata Bayesian Model Averaging Reference Manual.

— Meghan Cain
Assistant Director, Educational Services

«Back to main page