Multilevel tobit models

Order

Watch video demo

<- See Stata's other features

Highlights

Left-censoring, right-censoring, or both
Censoring that varies by observation
Random effects

Random intercepts
Random coefficients (slopes)

Multilevel: two, three, or more levels
Make inferences about either the uncensored or the censored outcome
Support for complex survey data
Graph marginal means and marginal effects
Intraclass correlation
Support for Bayesian estimation

The metobit command fits multilevel and panel-data models for which the outcome is censored. Censored means that rather than the outcome \(y\) being observed precisely in all observations, it is known only that \(y \leq y_l\) (left-censoring) or \(y \geq y_u\) (right-censoring) in some of the observations. For instance, the amount of a pollutant may be left-censored because the measurement instrument has a lower limit of detection. The number of attendees at an event may be right-censored because the stadium has a limited number of seats.

Multilevel means that the fitted model accounts for lack of independence within groups of observations, such as people who live near each other or students who attend the same school or students who are tested repeatedly. metobit can also fit models with multiple levels of nesting. You can fit models with data on students within school districts within cities and even have random effects for each level!

Let's see it work

Tobit models, whether multilevel or one-level, can be used for two types of inference—for the entire population as if it were not censored and for the censored population.

We have been hired to analyze data on attendance at 500 soccer stadiums. The data are censored when the stadium is sold out. In such cases, it is likely that attendance would have been greater had there been more seats.

Clients who run stadiums, who could increase the number of seats at a cost, would be interested in analysis of the uncensored population. Clients who rent and cannot increase the number of seats are interested in analysis of the censored population.

We can use metobit to answer questions for both types of clients. In fact, we will fit the model once and use different predictions to answer different questions.

The data we have include attend, stadium attendance in thousands for each game played during the season. We will model attendance as a function of

winp, the winning percentage of the local team;
inter, the probability that the local team makes it to an international postseason;
cost, the cost of a ticket for the game being played; and
weather, whether a storm—rain or snow—was forecast on game day.

We could fit this model with a linear multilevel estimator but for the fact that each stadium has a seating limit. That limit is recorded in the variable max.

Using metobit, we will type

. metobit attend winp inter cost i.weather || stadium: winp, ul(max)

Option ul(max) specifies the upper-censoring point.

Right of the || is the level-2 ID variable, stadium. We are specifying that we want random intercepts for each stadium and random coefficients for winning percentage.

We fit our desired model:

. metobit attend winp inter cost i.weather || stadium: winp, ul(max)

Fitting fixed-effects model:

Iteration 0:  Log likelihood = -21793.676
Iteration 1:  Log likelihood = -21321.165
Iteration 2:  Log likelihood = -21239.918
Iteration 3:  Log likelihood =  -21239.16
Iteration 4:  Log likelihood = -21239.159

Refining starting values:

Grid node 0:  Log likelihood = -19826.409

Fitting full model:

Iteration 0:  Log likelihood = -19826.409  (not concave)
Iteration 1:  Log likelihood = -18956.642  (not concave)
Iteration 2:  Log likelihood = -18440.049
Iteration 3:  Log likelihood = -17938.155
Iteration 4:  Log likelihood = -17860.781
Iteration 5:  Log likelihood =  -17822.36
Iteration 6:  Log likelihood = -17820.965
Iteration 7:  Log likelihood = -17820.958
Iteration 8:  Log likelihood = -17820.958

Mixed-effects tobit regression                  Number of obs     =      8,131
                                                   Uncensored     =      5,451
Limits: Lower = -inf                               Left-censored  =          0
        Upper = max                                Right-censored =      2,680

Group variable: stadium                         Number of groups  =        500
                                                Obs per group:
                                                              min =          9
                                                              avg =       16.3
                                                              max =         20

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(4)      =   11728.37
Log likelihood = -17820.958                     Prob > chi2       =     0.0000



      attend   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

        winp     .4287463   .0708727     6.05   0.000     .2898384    .5676542
       inter     .5575962   .0051627   108.00   0.000     .5474774    .5677149
        cost    -.0053072   .0005233   -10.14   0.000    -.0063329   -.0042815
   1.weather    -.2126963   .2977593    -0.71   0.475    -.7962937    .3709011
       _cons     9.213013    .348091    26.47   0.000     8.530767    9.895259

stadium       
    var(winp)     1.335236   .1471157                      1.075903    1.657078
   var(_cons)     35.74543   2.838831                      30.59284    41.76584

var(e.attend)     22.48149   .4547381                      21.60765    23.39066
LR test vs. tobit model: chi2(2) = 6836.40                Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

From the model, we can obtain estimates of average attendance. There are many ways to calculate average attendance. What would be the uncensored average attendance if max had not been in effect? What is the predicted average attendance given max? What would be average attendance if seating was increased by 1,000 in all stadiums having more than 90% average attendance? 2,000? 3,000?

In a world where max was not relevant, average attendance would have been about 23,510:

. margins

Predictive margins                                       Number of obs = 8,131
Model VCE: OIM

Expression: Marginal linear prediction, predict()



                          Delta-method                                      
                   Margin   std. err.      z    P>|z|     [95% conf. interval]

       _cons    23.50981   .3189898    73.70   0.000      22.8846    24.13502

In the real world where the current value of max is binding, it would be about 18,712:

. margins, predict(ystar(.,max))

Predictive margins                                       Number of obs = 8,131
Model VCE: OIM

Expression: E(attend*|attend<max), predict(ystar(.,max))



                          Delta-method                                      
                   Margin   std. err.      z    P>|z|     [95% conf. interval]

       _cons     18.7123   .1790272   104.52   0.000     18.36141    19.06319

We could also use margins to answer what attendance would be if max was increased by 1,000 in stadiums with over 90% attendance.

. quietly generate new_max = max + 1000

. margins if attend_rate<.90, predict(ystar(.,new_max))

Predictive margins                                       Number of obs = 5,284
Model VCE: OIM

Expression: E(attend*|attend<new_max), predict(ystar(.,new_max))



                          Delta-method                                      
                   Margin   std. err.      z    P>|z|     [95% conf. interval]

       _cons    26.81172   .3243089    82.67   0.000     26.17608    27.44735

Average attendance would be 26,812 in stadiums with attendance rate greater than 90%. This seems like a large number, but the stadiums in our sample with more than 90% attendance are the larger stadiums with teams with the highest winning percentage.

Tell me more

You can also fit Bayesian multilevel tobit models using the bayes prefix.

Learn more about Stata's multilevel mixed-effects models features.

Read more about multilevel tobit models in the Multilevel Mixed-Effects Reference Manual; see [ME] metobit.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


attend		Coefficient Std. err. z P>\|z\| [95% conf. interval]

winp		.4287463 .0708727 6.05 0.000 .2898384 .5676542
inter		.5575962 .0051627 108.00 0.000 .5474774 .5677149
cost		-.0053072 .0005233 -10.14 0.000 -.0063329 -.0042815
1.weather		-.2126963 .2977593 -0.71 0.475 -.7962937 .3709011
_cons		9.213013 .348091 26.47 0.000 8.530767 9.895259

stadium
var(winp)		1.335236 .1471157 1.075903 1.657078
var(_cons)		35.74543 2.838831 30.59284 41.76584

var(e.attend)		22.48149 .4547381 21.60765 23.39066


		Delta-method
		Margin std. err. z P>\|z\| [95% conf. interval]

_cons		23.50981 .3189898 73.70 0.000 22.8846 24.13502