Stata 15 help for nlogit

[R] nlogit -- Nested logit regression


Nested logit regression

nlogit depvar [indepvars] [if] [in] [weight] [|| lev1_equation [|| lev2_equation ...]] || altvar: [byaltvarlist], case(varname) [nlogit_options]

where the syntax of lev#_equation is

altvar: [byaltvarlist] [, base(#|lbl) estconst]

Create variable based on specification of branches

nlogitgen newaltvar = altvar (branchlist) [, nolog]

where branchlist is

branch, branch [, branch ...]

and branch is

[label:] alternative [| alternative [| alternative ...] ]

Display tree structure

nlogittree altvarlist [if] [in] [weight] [, nlogittree_options]

nlogit_options Description ------------------------------------------------------------------------- Model * case(varname) use varname to identify cases base(#|lbl) use the specified level or label of altvar as the base alternative for the bottom level noconstant suppress the constant terms for the bottom-level alternatives nonnormalized use the nonnormalized parameterization altwise use alternativewise deletion instead of casewise deletion constraints(constraints) apply specified linear constraints collinear keep collinear variables

SE/Robust vce(vcetype) vcetype may be oim, robust, cluster clustvar, bootstrap, or jackknife

Reporting level(#) set confidence level; default is level(95) notree suppress display of tree-structure output; see also nolabel and nobranches nocnsreport do not display constraints display_options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling

Maximization maximize_options control the maximization process; seldom used ------------------------------------------------------------------------- * case(varname) is required.

nlogittree_options Description ------------------------------------------------------------------------- Main choice(depvar) use depvar as the choice indicator variable case(varname) use varname to identify cases generate(newvar) create newvar to identify invalid observations nolabel suppress the value labels in tree-structure output nobranches suppress drawing branches in the tree-structure output -------------------------------------------------------------------------

byaltvarlist may contain factor variables; see fvvarlist. bootstrap, by, fp, jackknife, and statsby are allowed; see prefix. Weights are not allowed with the bootstrap prefix. fweights, iweights, and pweights are allowed with nlogit, and fweights are allowed with nlogittree; see weight. Weights for nlogit must be constant within case. See [R] nlogit postestimation for features available after estimation.



Statistics > Categorical outcomes > Nested logit regression


Statistics > Categorical outcomes > Setup for nested logit regression


Statistics > Categorical outcomes > Display nested logit tree structure


nlogit performs full information maximum-likelihood estimation for nested logit models. These models relax the assumption of independently distributed errors and the independence of irrelevant alternatives inherent in conditional and multinomial logit models by clustering similar alternatives into nests.

By default, nlogit uses a parameterization that is consistent with random utility maximization (RUM). Before version 10 of Stata, a nonnormalized version of the nested logit model was fit, which you can request by specifying the nonnormalized option.

You must use nlogitgen to generate a new categorical variable to specify the branches of the decision tree before calling nlogit.


Specification and options for lev#_equation

altvar is a variable identifying alternatives at this level of the hierarchy.

byaltvarlist specifies the variables to be used to compute the by-alternative regression coefficients for that level. For each variable specified in the variable list, there will be one regression coefficient for each alternative of that level of the hierarchy. If the variable is constant across each alternative (a case-specific variable), the regression coefficient associated with the base alternative is not identifiable. These regression coefficients are labeled as (base) in the regression table. If the variable varies among the alternatives, a regression coefficient is estimated for each alternative.

base(#|lbl) can be specified in each level equation where it identifies the base alternative to be used at that level. The default is the alternative that has the highest frequency.

If vce(bootstrap) or vce(jackknife) is specified, you must specify the base alternative for each level that has a byaltvarlist or if the constants will be estimated. Doing so ensures that the same model is fit with each call to nlogit.

estconst applies to all the level equations except the bottom-level equation. Specifying estconst requests that constants for each alternative (except the base alternative) be estimated. By default, no constant is estimated at these levels. Constants can be estimated in only one level of the tree hierarchy. If you specify estconst for one of the level equations, you must specify noconstant for the bottom-level equation.

Options for nlogit

+-------+ ----+ Model +------------------------------------------------------------

case(varname) specifies the variable that identifies each case. case() is required.

base(#|lbl) can be specified in each level equation where it identifies the base alternative to be used at that level. The default is the alternative that has the highest frequency.

If vce(bootstrap) or vce(jackknife) is specified, you must specify the base alternative for each level that has a byaltvarlist or if the constants will be estimated. Doing so ensures that the same model is fit with each call to nlogit.

noconstant applies only to the equation defining the bottom level of the hierarchy. By default, constants are estimated for each alternative of altvar, less the base alternative. To suppress the constant terms for this level, specify noconstant. If you do not specify noconstant, you cannot specify estconst for the higher-level equations.

nonnormalized requests a nonnormalized parameterization of the model that does not scale the inclusive values by the degree of dissimilarity of the alternatives within each nest. Use this option to replicate results from older versions of Stata. The default is to use the RUM-consistent parameterization.

altwise specifies that alternativewise deletion be used when marking out observations because of missing values in your variables. The default is to use casewise deletion. This option does not apply to observations that are marked out by the if or in qualifier or the by prefix.

constraints(constraints); see [R] estimation options.

The inclusive-valued/dissimilarity parameters are parameterized as ml ancillary parameters. They are labeled as [alternative_tau]_const, where alternative is one of the alternatives defining a branch in the tree. To constrain the inclusive-valued/dissimilarity parameter for alternative a1 to be, say, equal to alternative a2, you would use the following syntax:

. constraint 1 [a1_tau]_cons = [a2_tau]_cons

. nlogit ... , constraints(1)

collinear prevents collinear variables from being dropped. Use this option when you know that you have collinear variables and you are applying constraints() to handle the rank reduction. See [R] estimation options for details on using collinear with constraints().

nlogit will not allow you to specify an independent variable in more than one level equation. Specifying the collinear option will allow execution to proceed in this case, but it is your responsibility to ensure that the parameters are identified.

+-----------+ ----+ SE/Robust +--------------------------------------------------------

vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory (oim), that are robust to some kinds of misspecification (robust), that allow for intragroup correlation (cluster clustvar), and that use bootstrap or jackknife methods (bootstrap, jackknife); see [R] vce_option.

If vce(robust) or vce(cluster clustvar) is specified, the likelihood-ratio test for the independence of irrelevant alternatives (IIA) is not computed.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options.

notree specifies that the tree structure of the nested logit model not be displayed. See also nolabel and nobranches for when notree is not specified.

nocnsreport; see [R] estimation options.

display_options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R] estimation options.

+--------------+ ----+ Maximization +-----------------------------------------------------

maximize_options: difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(init_specs); see [R] maximize. These options are seldom used. The technique(bhhh) option is not allowed.

Specification and options for nlogitgen

newaltvar and altvar are variables identifying alternatives at each level of the hierarchy.

label defines a label to associate with the branch. If no label is given, a numeric value is used.

alternative specifies an alternative, of altvar specified in the syntax, to be included in the branch. It is either a numeric value or the label associated with that value. An example of nlogitgen is

. nlogitgen type = restaurant(fast: 1 | 2, family: CafeEccell | LosNortenos | WingsNmore, fancy: 6 | 7)

nolog suppresses the display of the iteration log.

Specification and options for nlogittree

+------+ ----+ Main +-------------------------------------------------------------

altvarlist is a list of alternative variables that define the tree hierarchy. The first variable must define bottom-level alternatives, and the order continues to the variable defining the top-level alternatives.

choice(depvar) defines the choice indicator variable and forces nlogittree to compute and display choice frequencies for each bottom-level alternative.

case(varname) specifies the variable that identifies each case. When both case() and choice() are specified, nlogittree executes diagnostics on the tree structure and identifies observations that will cause nlogit to terminate execution or drop observations.

generate(newvar) generates a new indicator variable, newvar, that is equal to 1 for invalid observations. This option requires that both choice() and case() are also specified.

nolabel forces nlogittree to suppress value labels in tree-structure output.

nobranches forces nlogittree to suppress drawing branches in the tree-structure output.

Remark on degenerate branches

Degenerate nests occur when there is only one alternative in a branch of the tree hierarchy. The associated dissimilarity parameter of the RUM model is not defined. The inclusive-valued parameter of the nonnormalized model will be identifiable if there are alternative-specific variables specified in equation 1 of the model specification (the indepvars in the model syntax). Numerically you can skirt the issue of nonidentifiable/undefined parameters by setting constraints on them. For the RUM model constraint, set the dissimilarity parameter to 1. See constraints for details on setting constraints on the dissimilarity parameters.


Setup . webuse restaurant

Generate a new categorical variable named type that identifies the first-level set of alternatives based on the variable named restaurant . nlogitgen type = restaurant(fast: Freebirds | MamasPizza, family: CafeEccell | LosNortenos | WingsNmore, fancy: Christophers | MadCows)

Examine the tree structure . nlogittree restaurant type, choice(chosen) case(family_id)

Perform nested logit regression . nlogit chosen cost distance rating || type: income kids, base(family) || restaurant:, noconst case(family_id)

Stored results

nlogit stores the following in e():

Scalars e(N) number of observations e(N_case) number of cases e(k_eq) number of equations in e(b) e(k_eq_model) number of equations in overall model test e(k_alt) number of alternatives for bottom level e(k_altj) number of alternatives for jth level e(k_indvars) number of independent variables e(k_ind2vars) number of by-alternative variables for bottom level e(k_ind2varsj) number of by-alternative variables for jth level e(df_m) model degrees of freedom e(df_c) clogit model degrees of freedom e(ll) log likelihood e(ll_c) clogit model log likelihood e(N_clust) number of clusters e(chi2) chi-squared e(chi2_c) likelihood-ratio test for IIA e(p) p-value for model Wald test e(p_c) p-value for IIA test e(i_base) base index for bottom level e(i_basej) base index for jth level e(levels) number of levels e(alt_min) minimum number of alternatives e(alt_avg) average number of alternatives e(alt_max) maximum number of alternatives e(const) constant indicator for bottom level e(constj) constant indicator for jth level e(rum) 1 if RUM model, 0 otherwise e(rank) rank of e(V) e(ic) number of iterations e(rc) return code e(converged) 1 if converged, 0 otherwise

Macros e(cmd) nlogit e(cmdline) command as typed e(depvar) name of dependent variable e(indvars) name of independent variables e(ind2vars) by-alternative variables for bottom level e(ind2varsj) by-alternative variables for jth level e(case) variable defining cases e(altvar) alternative variable for bottom level e(altvarj) alternative variable for jth level e(alteqs) equation names for bottom level e(alteqsj) equation names for jth level e(alti) ith alternative for bottom level e(altj_i) ith alternative for jth level e(wtype) weight type e(wexp) weight expression e(title) title in estimation output e(clustvar) name of cluster variable e(chi2type) Wald, type of model chi-squared test e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(opt) type of optimization e(which) max or min; whether optimizer is to perform maximization or minimization e(ml_method) type of ml method e(user) name of likelihood-evaluator program e(technique) maximization technique e(datasignature) the checksum e(datasignaturevars) variables used in calculation of checksum e(properties) b V e(estat_cmd) program used to implement estat e(predict) program used to implement predict e(marginsnotok) predictions disallowed by margins e(asbalanced) factor variables fvset as asbalanced e(asobserved) factor variables fvset as asobserved

Matrices e(b) coefficient vector e(Cns) constraints matrix e(k_altern) number of alternatives at each level e(k_branchj) number of branches at each alternative of jth level e(stats) alternative statistics for bottom level e(statsj) alternative statistics for jth level e(altidxj) alternative indices for jth level e(alt_ind2vars) indicators for bottom level estimated by-alternative variable -- e(k_alt) x e(k_ind2vars) e(alt_ind2varsj) indicators for jth level estimated by-alternative variable -- e(k_altj) x e(k_ind2varsj) e(ilog) iteration log (up to 20 iterations) e(gradient) gradient vector e(V) variance-covariance matrix of the estimators e(V_modelbased) model-based variance

Functions e(sample) marks estimation sample

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index