[R] nlogit -- Nested logit regression
Syntax
Nested logit regression
nlogit depvar [indepvars] [if] [in] [weight] [|| lev1_equation [||
lev2_equation ...]] || altvar: [byaltvarlist], case(varname)
[nlogit_options]
where the syntax of lev#_equation is
altvar: [byaltvarlist] [, base(#|lbl) estconst]
Create variable based on specification of branches
nlogitgen newaltvar = altvar (branchlist) [, nolog]
where branchlist is
branch, branch [, branch ...]
and branch is
[label:] alternative [| alternative [| alternative ...] ]
Display tree structure
nlogittree altvarlist [if] [in] [weight] [, nlogittree_options]
nlogit_options Description
-------------------------------------------------------------------------
Model
* case(varname) use varname to identify cases
base(#|lbl) use the specified level or label of altvar as
the base alternative for the bottom level
noconstant suppress the constant terms for the
bottom-level alternatives
nonnormalized use the nonnormalized parameterization
altwise use alternativewise deletion instead of
casewise deletion
constraints(constraints) apply specified linear constraints
collinear keep collinear variables
SE/Robust
vce(vcetype) vcetype may be oim, robust, cluster clustvar,
bootstrap, or jackknife
Reporting
level(#) set confidence level; default is level(95)
notree suppress display of tree-structure output;
see also nolabel and nobranches
nocnsreport do not display constraints
display_options control columns and column formats, row
spacing, line width, display of omitted
variables and base and empty cells, and
factor-variable labeling
Maximization
maximize_options control the maximization process; seldom used
-------------------------------------------------------------------------
* case(varname) is required.
nlogittree_options Description
-------------------------------------------------------------------------
Main
choice(depvar) use depvar as the choice indicator variable
case(varname) use varname to identify cases
generate(newvar) create newvar to identify invalid
observations
nolabel suppress the value labels in tree-structure
output
nobranches suppress drawing branches in the
tree-structure output
-------------------------------------------------------------------------
byaltvarlist may contain factor variables; see fvvarlist.
bootstrap, by, fp, jackknife, and statsby are allowed; see prefix.
Weights are not allowed with the bootstrap prefix.
fweights, iweights, and pweights are allowed with nlogit, and fweights
are allowed with nlogittree; see weight. Weights for nlogit must be
constant within case.
See [R] nlogit postestimation for features available after estimation.
Menu
nlogit
Statistics > Categorical outcomes > Nested logit regression
nlogitgen
Statistics > Categorical outcomes > Setup for nested logit regression
nlogittree
Statistics > Categorical outcomes > Display nested logit tree
structure
Description
nlogit performs full information maximum-likelihood estimation for nested
logit models. These models relax the assumption of independently
distributed errors and the independence of irrelevant alternatives
inherent in conditional and multinomial logit models by clustering
similar alternatives into nests.
By default, nlogit uses a parameterization that is consistent with random
utility maximization (RUM). Before version 10 of Stata, a nonnormalized
version of the nested logit model was fit, which you can request by
specifying the nonnormalized option.
You must use nlogitgen to generate a new categorical variable to specify
the branches of the decision tree before calling nlogit.
Options
Specification and options for lev#_equation
altvar is a variable identifying alternatives at this level of the
hierarchy.
byaltvarlist specifies the variables to be used to compute the
by-alternative regression coefficients for that level. For each
variable specified in the variable list, there will be one regression
coefficient for each alternative of that level of the hierarchy. If
the variable is constant across each alternative (a case-specific
variable), the regression coefficient associated with the base
alternative is not identifiable. These regression coefficients are
labeled as (base) in the regression table. If the variable varies
among the alternatives, a regression coefficient is estimated for
each alternative.
base(#|lbl) can be specified in each level equation where it identifies
the base alternative to be used at that level. The default is the
alternative that has the highest frequency.
If vce(bootstrap) or vce(jackknife) is specified, you must specify
the base alternative for each level that has a byaltvarlist or if the
constants will be estimated. Doing so ensures that the same model is
fit with each call to nlogit.
estconst applies to all the level equations except the bottom-level
equation. Specifying estconst requests that constants for each
alternative (except the base alternative) be estimated. By default,
no constant is estimated at these levels. Constants can be estimated
in only one level of the tree hierarchy. If you specify estconst for
one of the level equations, you must specify noconstant for the
bottom-level equation.
Options for nlogit
+-------+
----+ Model +------------------------------------------------------------
case(varname) specifies the variable that identifies each case. case()
is required.
base(#|lbl) can be specified in each level equation where it identifies
the base alternative to be used at that level. The default is the
alternative that has the highest frequency.
If vce(bootstrap) or vce(jackknife) is specified, you must specify
the base alternative for each level that has a byaltvarlist or if the
constants will be estimated. Doing so ensures that the same model is
fit with each call to nlogit.
noconstant applies only to the equation defining the bottom level of the
hierarchy. By default, constants are estimated for each alternative
of altvar, less the base alternative. To suppress the constant terms
for this level, specify noconstant. If you do not specify
noconstant, you cannot specify estconst for the higher-level
equations.
nonnormalized requests a nonnormalized parameterization of the model that
does not scale the inclusive values by the degree of dissimilarity of
the alternatives within each nest. Use this option to replicate
results from older versions of Stata. The default is to use the
RUM-consistent parameterization.
altwise specifies that alternativewise deletion be used when marking out
observations because of missing values in your variables. The
default is to use casewise deletion. This option does not apply to
observations that are marked out by the if or in qualifier or the by
prefix.
constraints(constraints); see [R] estimation options.
The inclusive-valued/dissimilarity parameters are parameterized as ml
ancillary parameters. They are labeled as [alternative_tau]_const,
where alternative is one of the alternatives defining a branch in the
tree. To constrain the inclusive-valued/dissimilarity parameter for
alternative a1 to be, say, equal to alternative a2, you would use the
following syntax:
. constraint 1 [a1_tau]_cons = [a2_tau]_cons
. nlogit ... , constraints(1)
collinear prevents collinear variables from being dropped. Use this
option when you know that you have collinear variables and you are
applying constraints() to handle the rank reduction. See [R]
estimation options for details on using collinear with constraints().
nlogit will not allow you to specify an independent variable in more
than one level equation. Specifying the collinear option will allow
execution to proceed in this case, but it is your responsibility to
ensure that the parameters are identified.
+-----------+
----+ SE/Robust +--------------------------------------------------------
vce(vcetype) specifies the type of standard error reported, which
includes types that are derived from asymptotic theory (oim), that
are robust to some kinds of misspecification (robust), that allow for
intragroup correlation (cluster clustvar), and that use bootstrap or
jackknife methods (bootstrap, jackknife); see [R] vce_option.
If vce(robust) or vce(cluster clustvar) is specified, the
likelihood-ratio test for the independence of irrelevant alternatives
(IIA) is not computed.
+-----------+
----+ Reporting +--------------------------------------------------------
level(#); see [R] estimation options.
notree specifies that the tree structure of the nested logit model not be
displayed. See also nolabel and nobranches for when notree is not
specified.
nocnsreport; see [R] estimation options.
display_options: noci, nopvalues, noomitted, vsquish, noemptycells,
baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style),
cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R]
estimation options.
+--------------+
----+ Maximization +-----------------------------------------------------
maximize_options: difficult, technique(algorithm_spec), iterate(#),
[no]log, trace, gradient, showstep, hessian, showtolerance,
tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and
from(init_specs); see [R] maximize. These options are seldom used.
The technique(bhhh) option is not allowed.
Specification and options for nlogitgen
newaltvar and altvar are variables identifying alternatives at each level
of the hierarchy.
label defines a label to associate with the branch. If no label is
given, a numeric value is used.
alternative specifies an alternative, of altvar specified in the syntax,
to be included in the branch. It is either a numeric value or the
label associated with that value. An example of nlogitgen is
. nlogitgen type = restaurant(fast: 1 | 2,
family: CafeEccell | LosNortenos | WingsNmore, fancy: 6 | 7)
nolog suppresses the display of the iteration log.
Specification and options for nlogittree
+------+
----+ Main +-------------------------------------------------------------
altvarlist is a list of alternative variables that define the tree
hierarchy. The first variable must define bottom-level alternatives,
and the order continues to the variable defining the top-level
alternatives.
choice(depvar) defines the choice indicator variable and forces
nlogittree to compute and display choice frequencies for each
bottom-level alternative.
case(varname) specifies the variable that identifies each case. When
both case() and choice() are specified, nlogittree executes
diagnostics on the tree structure and identifies observations that
will cause nlogit to terminate execution or drop observations.
generate(newvar) generates a new indicator variable, newvar, that is
equal to 1 for invalid observations. This option requires that both
choice() and case() are also specified.
nolabel forces nlogittree to suppress value labels in tree-structure
output.
nobranches forces nlogittree to suppress drawing branches in the
tree-structure output.
Remark on degenerate branches
Degenerate nests occur when there is only one alternative in a branch of
the tree hierarchy. The associated dissimilarity parameter of the RUM
model is not defined. The inclusive-valued parameter of the
nonnormalized model will be identifiable if there are
alternative-specific variables specified in equation 1 of the model
specification (the indepvars in the model syntax). Numerically you can
skirt the issue of nonidentifiable/undefined parameters by setting
constraints on them. For the RUM model constraint, set the dissimilarity
parameter to 1. See constraints for details on setting constraints on
the dissimilarity parameters.
Examples
Setup
. webuse restaurant
Generate a new categorical variable named type that identifies the
first-level set of alternatives based on the variable named restaurant
. nlogitgen type = restaurant(fast: Freebirds | MamasPizza, family:
CafeEccell | LosNortenos | WingsNmore, fancy: Christophers |
MadCows)
Examine the tree structure
. nlogittree restaurant type, choice(chosen) case(family_id)
Perform nested logit regression
. nlogit chosen cost distance rating || type: income kids,
base(family) || restaurant:, noconst case(family_id)
Stored results
nlogit stores the following in e():
Scalars
e(N) number of observations
e(N_case) number of cases
e(k_eq) number of equations in e(b)
e(k_eq_model) number of equations in overall model test
e(k_alt) number of alternatives for bottom level
e(k_altj) number of alternatives for jth level
e(k_indvars) number of independent variables
e(k_ind2vars) number of by-alternative variables for bottom
level
e(k_ind2varsj) number of by-alternative variables for jth
level
e(df_m) model degrees of freedom
e(df_c) clogit model degrees of freedom
e(ll) log likelihood
e(ll_c) clogit model log likelihood
e(N_clust) number of clusters
e(chi2) chi-squared
e(chi2_c) likelihood-ratio test for IIA
e(p) p-value for model Wald test
e(p_c) p-value for IIA test
e(i_base) base index for bottom level
e(i_basej) base index for jth level
e(levels) number of levels
e(alt_min) minimum number of alternatives
e(alt_avg) average number of alternatives
e(alt_max) maximum number of alternatives
e(const) constant indicator for bottom level
e(constj) constant indicator for jth level
e(rum) 1 if RUM model, 0 otherwise
e(rank) rank of e(V)
e(ic) number of iterations
e(rc) return code
e(converged) 1 if converged, 0 otherwise
Macros
e(cmd) nlogit
e(cmdline) command as typed
e(depvar) name of dependent variable
e(indvars) name of independent variables
e(ind2vars) by-alternative variables for bottom level
e(ind2varsj) by-alternative variables for jth level
e(case) variable defining cases
e(altvar) alternative variable for bottom level
e(altvarj) alternative variable for jth level
e(alteqs) equation names for bottom level
e(alteqsj) equation names for jth level
e(alti) ith alternative for bottom level
e(altj_i) ith alternative for jth level
e(wtype) weight type
e(wexp) weight expression
e(title) title in estimation output
e(clustvar) name of cluster variable
e(chi2type) Wald, type of model chi-squared test
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. Err.
e(opt) type of optimization
e(which) max or min; whether optimizer is to perform
maximization or minimization
e(ml_method) type of ml method
e(user) name of likelihood-evaluator program
e(technique) maximization technique
e(datasignature) the checksum
e(datasignaturevars) variables used in calculation of checksum
e(properties) b V
e(estat_cmd) program used to implement estat
e(predict) program used to implement predict
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(Cns) constraints matrix
e(k_altern) number of alternatives at each level
e(k_branchj) number of branches at each alternative of jth
level
e(stats) alternative statistics for bottom level
e(statsj) alternative statistics for jth level
e(altidxj) alternative indices for jth level
e(alt_ind2vars) indicators for bottom level estimated
by-alternative variable -- e(k_alt) x
e(k_ind2vars)
e(alt_ind2varsj) indicators for jth level estimated
by-alternative variable -- e(k_altj) x
e(k_ind2varsj)
e(ilog) iteration log (up to 20 iterations)
e(gradient) gradient vector
e(V) variance-covariance matrix of the estimators
e(V_modelbased) model-based variance
Functions
e(sample) marks estimation sample