Panel-data multinomial logit model

Order

Watch video demo

<- See Stata's other features

Highlights

Panel-data modeling of unordered categorical outcomes
Random-effects estimator

Independent
Identity
Shared
Exchangeable
Unstructured

Conditional fixed-effects (FE) estimator

Permutation subsets lessen curse of dimensionality

Bayesian estimation
Robust, cluster–robust, and bootstrap standard errors
Support for complex survey dataestimator
See more panel data features

The multinomial logit (MNL) model is a popular method for modeling categorical outcomes that have no natural ordering—outcomes such as occupation, political party, or restaurant choice.

In longitudinal/panel data, we observe a sequence of outcomes over time. Say that we observe restaurant choices made by individuals each week. Do you think that restaurant choices are independent from week to week? Probably not. Someone who likes Italian food is likely to choose an Italian restaurant multiple times. These choices are driven by underlying personal preferences and characteristics, some of which are not observed.

Stata's xtmlogit command fits random-effects and conditional fixed-effects MNL models for categorical outcomes observed over time.

To fit a random-effects multinomial logit model, we can type

. xtset subject
. xtmlogit restaurant age

and estimate the standard multinomial logit coefficients accounting for time-invariant subject-specific characteristics by including random effects specific to each outcome level.

With the command above, the random effects are assumed to be normally distributed and independent across outcome levels (restaurant choices), but several variance–covariance structures are supported, including a completely unrestricted covariance:

. xtmlogit restaurant age, covariance(unstructured)

If you suspect that subject-specific effects might be correlated with age, you can use a conditional fixed-effects estimator to account for this:

. xtmlogit restaurant age, fe

Let's see it work

We wish to find out whether individuals are more likely to be out of the labor force if they have children under the age of five in their household. We will use a (fictitious) dataset of men and women who were asked about their employment status every two years.

. webuse estatus
(Fictitious employment status data)

Here is an excerpt of the dataset, showing the employment history for three individuals:

. list id year estatus hhchild age in 22/41, sepby(id) noobs


    
     id   year              estatus   hhchild   age 
    
      5   2002             Employed       Yes    38 
      5   2004             Employed        No    40 
      5   2006             Employed        No    42 
      5   2008             Employed        No    44 
      5   2010   Out of labor force        No    46 
      5   2012   Out of labor force        No    48 
      5   2014           Unemployed        No    50 
    
      6   2002           Unemployed       Yes    31 
      6   2004             Employed       Yes    33 
      6   2006   Out of labor force       Yes    35 
      6   2008           Unemployed       Yes    37 
      6   2010   Out of labor force       Yes    39 
      6   2012           Unemployed        No    41 
    
      7   2002   Out of labor force       Yes    33 
      7   2004             Employed       Yes    35 
      7   2006             Employed       Yes    37 
      7   2008   Out of labor force       Yes    39 
      7   2010             Employed        No    41 
      7   2012             Employed        No    43 
      7   2014             Employed        No    45

The outcome of interest is employment status (estatus), which has three levels: Employed, Unemployed (but seeking employment), and Out of labor force (not seeking employment). Our predictor of interest, hhchild, indicates whether they have children under the age of five in their household at the time of the interview.

Before we can fit our model, we need to specify our panel identifier variable, id, by using xtset.

. xtset id

Panel variable: id (unbalanced)

Now we can use xtmlogit to model the probability of each employment type by hhchild while controlling for the effects of age, annual household income (hhincome), and whether a significant other was also living in the household (hhsigno). We will start with a random-effects model (the default) and use the rrr option to get exponentiated coefficients that can be interpreted as relative-risk ratios.

. xtmlogit estatus i.hhchild age hhincome i.hhsigno, rrr

Fitting comparison model ...

Refining starting values:

Grid node 0:  Log likelihood = -4504.5591
Grid node 1:  Log likelihood = -4538.6352

Fitting full model:

Iteration 0:  Log likelihood = -4504.5591
Iteration 1:  Log likelihood =  -4495.871
Iteration 2:  Log likelihood = -4490.5098
Iteration 3:  Log likelihood = -4490.4197
Iteration 4:  Log likelihood = -4490.4196

Random-effects multinomial logistic regression       Number of obs    =  4,761
Group variable: id                                   Number of groups =    800

Random effects u_i ~ Gaussian                        Obs per group:
                                                                  min =      5
                                                                  avg =    6.0
                                                                  max =      7

Integration method: mvaghermite                      Integration pts. =      7

                                                     Wald chi2(8)     = 199.25
Log likelihood = -4490.4196                          Prob > chi2      = 0.0000



           estatus          RRR   Std. err.      z    P>|z|     [95% conf. interval]

Out_of_labor_force                                                                  
           hhchild                                                                  
              Yes      1.579937   .1513905     4.77   0.000     1.309414    1.906349
               age     .9947946   .0065832    -0.79   0.430      .981975    1.007781
          hhincome     .9954927   .0018251    -2.46   0.014     .9919221    .9990762
                                                                              
           hhsigno                                                                  
              Yes      1.642859   .1550291     5.26   0.000     1.365452    1.976625
             _cons     .4949307   .1392991    -2.50   0.012     .2850836     .859244

Unemployed                                                                          
           hhchild                                                                  
              Yes      .9607243   .1148148    -0.34   0.737     .7601038    1.214296
               age     1.004257    .008211     0.52   0.603     .9882918     1.02048
          hhincome     .9696874   .0025722   -11.60   0.000      .964659    .9747421
                                                                              
           hhsigno                                                                  
              Yes      1.099323   .1310654     0.79   0.427     .8702452    1.388701
             _cons     .8078165    .280628    -0.61   0.539     .4088963    1.595924

Employed              (base outcome)                                                

            var(u1)     .8573133   .1083915                      .6691459    1.098394
            var(u2)     .7378532   .1388652                      .5102376    1.067008

Note: Estimates are transformed only in the first 3 equations to relative-risk
      ratios.
Note: _cons estimates baseline relative risk (conditional on zero random effects).
LR test vs. multinomial logit: chi2(2) = 227.68           Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

The first two sections in the output show the relative-risk ratio estimates of our predictors with respect to the base category Employed. The last section shows the estimated variances of the random effects. By default, the random effects are uncorrelated, but their covariance structure can be changed using the covariance() option. For example, correlations between random effects can be estimated using covariance(unstructured), or each category can share a common random effect using covariance(shared).

Adjusting for age, household income, and having a significant other at home, the relative risk of being out of the labor force for individuals having at least one child under the age of 5 in the household versus having no children under 5 in the household is 1.6 times as large as the relative risk of employment (95% CI [1.3, 1.9]). To understand these effects in terms of probabilities, we can use the margins command.

. margins hhchild

Predictive margins                                       Number of obs = 4,761
Model VCE: OIM

1._predict: Pr(estatus==Out_of_labor_force), predict(pr outcome(1))
2._predict: Pr(estatus==Unemployed), predict(pr outcome(2))
3._predict: Pr(estatus==Employed), predict(pr outcome(3))



                          Delta-method                                        
                   Margin   std. err.      z    P>|z|     [95% conf. interval]

_predict#hhchild                                                                  
           1#No      .3025675   .0131546    23.00   0.000      .276785      .32835
          1#Yes      .3912476   .0120405    32.49   0.000     .3676486    .4148466
           2#No      .1628713   .0101131    16.11   0.000     .1430501    .1826925
          2#Yes      .1398537   .0079462    17.60   0.000     .1242794    .1554279
           3#No      .5345612   .0136994    39.02   0.000     .5077108    .5614116
          3#Yes      .4688987   .0116594    40.22   0.000     .4460468    .4917507

For an individual without children, the expected probability of being out of the labor force (labeled 1#No) is 0.30, the expected probability of being unemployed (2#No) is 0.16, and the expected probability of being employed is 0.53 (3#No). We also find that individuals with children in the household increase their probability of being out of the labor force by 9 percentage points. We could see how these probabilities change by household income using an additional margins command and visualize the results using marginsplot.

. quietly margins hhchild, at(hhincome=(20(20)100))

. marginsplot, by(_predict, label("Out of labor force" "Unemployed" "Employed"))
     byopts(rows(1) title("Marginal probabilities of employment status"))
     legend(order(4 "Child under 5 at home" 3 "No child under 5 at home"))

To get separate graphs for each outcome, we used the by(_predict) option in marginsplot. The rest of the options add titles and labels.

Comparing the lines within each employment category, we see that having a child at home does not have much impact on the probability of being unemployed but does influence the decision to work or to be out of the labor force.

In the model we just fit, we used random effects to account for unobserved characteristics of the individuals in our dataset. Random-effects models require that the random effects be uncorrelated with the predictors, and the random-effects MNL model is no exception. A widely used alternative is the fixed-effects estimator. To fit our model with conditional fixed effects, we simply add the fe option.

. xtmlogit estatus i.hhchild age hhincome i.hhsigno, fe rrr
note: 80 groups (451 obs) omitted because of no variation in the outcome variable
      over time.

Computing initial values ...

Setting up 26,168 permutations:
....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%

Fitting full model:

Iteration 0:  Log likelihood = -2154.4175
Iteration 1:  Log likelihood = -2154.2058
Iteration 2:  Log likelihood = -2154.2057

Fixed-effects multinomial logistic regression        Number of obs    =  4,310
Group variable: id                                   Number of groups =    720

                                                     Obs per group:
                                                                  min =      5
                                                                  avg =    6.0
                                                                  max =      7

                                                     LR chi2(8)       =  67.42
Log likelihood = -2154.2057                          Prob > chi2      = 0.0000



           estatus          RRR   Std. err.      z    P>|z|     [95% conf. interval]

Out_of_labor_force                                                                  
           hhchild                                                                  
              Yes      1.784236   .2237128     4.62   0.000     1.395488     2.28128
               age     .9977834   .0146507    -0.15   0.880     .9694778    1.026915
          hhincome     .9895225   .0086923    -1.20   0.231     .9726318    1.006707
                                                                                    
           hhsigno                                                                  
              Yes      1.658753   .1654425     5.07   0.000     1.364217    2.016878

Unemployed                                                                          
           hhchild                                                                  
              Yes      1.181866   .1933766     1.02   0.307     .8576197    1.628702
               age     1.004991   .0194887     0.26   0.797      .967511    1.043924
          hhincome     .9717411   .0116616    -2.39   0.017     .9491514    .9948684
                                                                                    
           hhsigno                                                                  
              Yes       1.11936   .1454154     0.87   0.385     .8677426    1.443939

Employed              (base outcome)

The results are similar to those of the random-effects estimator. And they can be interpreted in the same way.

Tell me more

Learn more in the Stata Longitudinal-Data/Panel-Data Reference Manual.

Fit Bayesian fixed-effects and random-effects MNL models using the bayes prefix.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


	id year estatus hhchild age

	5 2002 Employed Yes 38
	5 2004 Employed No 40
	5 2006 Employed No 42
	5 2008 Employed No 44
	5 2010 Out of labor force No 46
	5 2012 Out of labor force No 48
	5 2014 Unemployed No 50

	6 2002 Unemployed Yes 31
	6 2004 Employed Yes 33
	6 2006 Out of labor force Yes 35
	6 2008 Unemployed Yes 37
	6 2010 Out of labor force Yes 39
	6 2012 Unemployed No 41

	7 2002 Out of labor force Yes 33
	7 2004 Employed Yes 35
	7 2006 Employed Yes 37
	7 2008 Out of labor force Yes 39
	7 2010 Employed No 41
	7 2012 Employed No 43
	7 2014 Employed No 45


estatus		RRR Std. err. z P>\|z\| [95% conf. interval]

Out_of_labor_force
hhchild
Yes		1.579937 .1513905 4.77 0.000 1.309414 1.906349
age		.9947946 .0065832 -0.79 0.430 .981975 1.007781
hhincome		.9954927 .0018251 -2.46 0.014 .9919221 .9990762

hhsigno
Yes		1.642859 .1550291 5.26 0.000 1.365452 1.976625
_cons		.4949307 .1392991 -2.50 0.012 .2850836 .859244

Unemployed
hhchild
Yes		.9607243 .1148148 -0.34 0.737 .7601038 1.214296
age		1.004257 .008211 0.52 0.603 .9882918 1.02048
hhincome		.9696874 .0025722 -11.60 0.000 .964659 .9747421

hhsigno
Yes		1.099323 .1310654 0.79 0.427 .8702452 1.388701
_cons		.8078165 .280628 -0.61 0.539 .4088963 1.595924

Employed		(base outcome)

var(u1)		.8573133 .1083915 .6691459 1.098394
var(u2)		.7378532 .1388652 .5102376 1.067008


		Delta-method
		Margin std. err. z P>\|z\| [95% conf. interval]

_predict#hhchild
1#No		.3025675 .0131546 23.00 0.000 .276785 .32835
1#Yes		.3912476 .0120405 32.49 0.000 .3676486 .4148466
2#No		.1628713 .0101131 16.11 0.000 .1430501 .1826925
2#Yes		.1398537 .0079462 17.60 0.000 .1242794 .1554279
3#No		.5345612 .0136994 39.02 0.000 .5077108 .5614116
3#Yes		.4688987 .0116594 40.22 0.000 .4460468 .4917507