Marginal predictions, means, effects, and more

Order

<- See Stata's other features

Highlights

Integrates out random effects (latent variables) after

Multilevel models
SEM (structural equation models)

Marginal (population-averaged) predictions
Marginal effects, marginal means, all other margins results
For survival outcomes, plots of survivor, hazard, and cumulative hazard functions

We are about to tell you that margins can make meaningful predictions in the presence of random effects, random coefficients, and latent variables. We are about to tell you that margins and Stata's predict integrate over the unobserved effects. This is exciting. Here's why.

Making meaningful predictions can be difficult even in the absence of random effects or random coefficients. For instance, consider a model as simple as logistic regression. Assume you have a special interest in variable's x's effect on outcome. You fit a model. Based on that, you can now say the effect of a 1-unit increase in x is to decrease the log odds by 0.22. Even those of us familiar with logistic regression would probably respond, "Yes, and that means what?"

"That's easy", you reply, exponentiating -0.22 in your head. "If x increases by 1, the odds fall to 80% of what it otherwise would be".

To which a reasonable person could reasonably reply, "And what would the odds otherwise be?"

"Well, that depends on the patient's (worker's, student's, etc.) other characteristics".

"That's rather obvious. Perhaps you could tell me the increase in terms of probabilities?"

"Well, that's rather difficult. You see, probabilities are a nonlinear function of odds ratios, so the baseline probability enters into it, and really, it's very similar to the previous problem, with an additional complication".

Irritation shows on the face of our reasonable person. "Surely you can tell me something interpretable. How about you tell me the number of lives that would be saved in the data if we increased x by one over what it otherwise would be. Surely you can do that. Perhaps with a 95% confidence interval?"

You realize that you can do that, but that's not a calculation you are going to make in your head. Fortunately for you, margins can help; it will make the calculation, with standard errors, and with confidence intervals. So you agree to make the calculation to return later.

"We have 2,500 people in our data", you report. "If we increased x by 1, we would expect to save 102 lives, say, 70 to 140".

You just got the reasonable person's attention. Of course, you would have lost it if you had said that the expected value was around 0.5, somewhere between 0.2 and 0.7. That's why we make such calculations.

Now, let's complicate our model. Let's assume that the model does not include just x, it includes an interaction of x with age, and that while x is estimated to have a whopping effect, x multiplied by age has a negative effect.

We do not need to repeat the dialog, clearly estimating the effect of x on lives saved is going to be more difficult, and more important. Reporting just the coefficients or the odds ratios does not even reveal whether increasing x results in lives saved or lives lost.

margins can do that calculation, too, and in fact, it is no more difficult for margins than the first calculation. The statement that you would save 70 to 140 lives is now even more impressive.

Now, let's complicate our model one last time. Let's add random effects. Let's assume that our outcomes come from patients who have doctors who work in hospitals and that we have introduced random effects for those doctors and hospitals. And perhaps we have introduced a random coefficient as well, either on x or on some other variable.

Now, calculating the prediction is truly difficult. We want to predict an average effect, an effect as if patients had been assigned randomly to hospitals and doctors, and we have a nonlinear model, and that means we cannot ignore the variance of the random effects. Instead we will need to dust off our calculus tools and, for each patient, integrate over all unobserved variables.

margins can do that, too.

And margins can do it not just for logistic regression and multilevel logistic regression but for

multilevel probit
multilevel complementary log-log
multilevel ordered logit
multilevel ordered probit
multilevel Poisson
multilevel negative binomial
multilevel interval regression
multilevel tobit
multilevel nonlinear regression
multilevel parametric survival models
structural equation models

Let's see it work

We have fictional hospital data on 2,500 patients treated in 20 hospitals by roughly 600 different doctors. We fit the following model,

. melogit outcome i.x c.age i.x#c.age other || hosp: || doctor:
(output omitted)

which is how you specify in Stata that you want a multilevel logistic regression to be fit containing (indicator variable) x, (continuous variable) age, x*age, and other and that you want random effects for each hospital and for each doctor within hospital.

The omitted output reports that

	coefficient	Z
x	-17.11	-10.69
age	0.45	5.92
x*age	2.21	10.16
other	1.23	8.81
constant	-4.79	-6.61

The coefficient on x is outlandish, but we remind you that that sometimes happens when you include a variable and its interaction with age. The sum of effects in this case is reasonable.

The omitted output also reported large estimated variances of the random effects, namely, 8.33 and 11.64. We could tell a story about how we expected those large variances in the case of this treatment, but because the story is fictional, we will not bother.

In fact, in these fictional data, we intentionally made the variances large just so they would have a larger effect on the predicted number of lives saved.

Based on the above results, does treatment x save lives or lose them? We type

. margins r.x

Contrasts of predictive margins
Model VCE: OIM

Expression: Marginal predicted mean, predict()



                       df        chi2     P>chi2

           x            1       52.05     0.0000



                          Delta-method
                 Contrast   std. err.     [95% conf. interval]

           x    
   (1 vs 0)     -.1745077   .0241884     -.2219161   -.1270992

The output reports the difference in probabilities is -0.175; patients die less with x=1 instead of x=0, and the output reports the corresponding 95% confidence interval.

We have 2,500 patients in these data, so multiplying the difference in probabilities and changing the sign results in lives saved. We obtain 436 fewer deaths if all had x=1 with a 95% confidence interval of [318, 555].

The 436 results take into account the estimated coefficients for x and x*age along with the values of age and other in our data, along with the estimated random effects for doctor and hospital.

The confidence interval takes into account all the above plus the uncertainty because some of the ingredients were estimated rather than known.

Tell me more

For an example of integrating out random effects to obtain marginal predictions and marginal survivor functions; see [ME] mestreg postestimation and visit multilevel survival models.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		df chi2 P>chi2

x		1 52.05 0.0000



		Delta-method
		Contrast std. err. [95% conf. interval]

x
(1 vs 0)		-.1745077 .0241884 -.2219161 -.1270992