**[R] nlogit** -- Nested logit regression

__Syntax__

Nested logit regression

**nlogit** *depvar* [*indepvars*] [*if*] [*in*] [*weight*] [**||** *lev1_equation* [**||**
*lev2_equation* ...]] **||** *altvar***:** [*byaltvarlist*]**,** **case(***varname***)**
[*nlogit_options*]

where the syntax of *lev#_equation* is

*altvar***:** [*byaltvarlist*] [**,** **base(***#*|*lbl***)** __estc__**onst**]

Create variable based on specification of branches

**nlogitgen** *newaltvar* **=** *altvar* **(***branchlist***)** [**,** __nolo__**g**]

where *branchlist* is

*branch***,** *branch* [**,** *branch ...*]

and *branch* is

[*label***:**] *alternative* [**|** *alternative* [**|** *alternative ...*] ]

Display tree structure

**nlogittree** *altvarlist* [*if*] [*in*] [*weight*] [**,** *nlogittree_options*]

*nlogit_options* Description
-------------------------------------------------------------------------
Model
* **case(***varname***)** use *varname* to identify cases
**base(***#*|*lbl***)** use the specified level or label of *altvar* as
the base alternative for the bottom level
__nocons__**tant** suppress the constant terms for the
bottom-level alternatives
__nonn__**ormalized** use the nonnormalized parameterization
**altwise** use alternativewise deletion instead of
casewise deletion
__const__**raints(***constraints***)** apply specified linear constraints
__col__**linear** keep collinear variables

SE/Robust
**vce(***vcetype***)** *vcetype* may be **oim**, __r__**obust**, __cl__**uster** *clustvar*,
__boot__**strap**, or __jack__**knife**

Reporting
__l__**evel(***#***)** set confidence level; default is **level(95)**
__notr__**ee** suppress display of tree-structure output;
see also **nolabel** and **nobranches**
__nocnsr__**eport** do not display constraints
*display_options* control columns and column formats, row
spacing, line width, display of omitted
variables and base and empty cells, and
factor-variable labeling

Maximization
*maximize_options* control the maximization process; seldom used
-------------------------------------------------------------------------
* **case(***varname***)** is required.

*nlogittree_options* Description
-------------------------------------------------------------------------
Main
__cho__**ice(***depvar***)** use *depvar* as the choice indicator variable
**case(***varname***)** use *varname* to identify cases
__gen__**erate(***newvar***)** create *newvar* to identify invalid
observations
__nolab__**el** suppress the value labels in tree-structure
output
__nobranch__**es** suppress drawing branches in the
tree-structure output
-------------------------------------------------------------------------

*byaltvarlist* may contain factor variables; see fvvarlist.
**bootstrap**, **by**, **fp**, **jackknife**, and **statsby** are allowed; see prefix.
Weights are not allowed with the **bootstrap** prefix.
**fweight**s, **iweight**s, and **pweight**s are allowed with **nlogit**, and **fweight**s
are allowed with **nlogittree**; see weight. Weights for **nlogit** must be
constant within case.
See **[R] nlogit postestimation** for features available after estimation.

__Menu__

__nlogit__

**Statistics > Categorical outcomes > Nested logit regression**

__nlogitgen__

**Statistics > Categorical outcomes > Setup for nested logit regression**

__nlogittree__

**Statistics > Categorical outcomes > Display nested logit tree**
**structure**

__Description__

**nlogit** performs full information maximum-likelihood estimation for nested
logit models. These models relax the assumption of independently
distributed errors and the independence of irrelevant alternatives
inherent in conditional and multinomial logit models by clustering
similar alternatives into nests.

By default, **nlogit** uses a parameterization that is consistent with random
utility maximization (RUM). Before version 10 of Stata, a nonnormalized
version of the nested logit model was fit, which you can request by
specifying the **nonnormalized** option.

You must use **nlogitgen** to generate a new categorical variable to specify
the branches of the decision tree before calling **nlogit**.

__Options__

__Specification and options for lev#_equation__

*altvar* is a variable identifying alternatives at this level of the
hierarchy.

*byaltvarlist* specifies the variables to be used to compute the
by-alternative regression coefficients for that level. For each
variable specified in the variable list, there will be one regression
coefficient for each alternative of that level of the hierarchy. If
the variable is constant across each alternative (a case-specific
variable), the regression coefficient associated with the base
alternative is not identifiable. These regression coefficients are
labeled as (base) in the regression table. If the variable varies
among the alternatives, a regression coefficient is estimated for
each alternative.

**base(***#*|*lbl***)** can be specified in each level equation where it identifies
the base alternative to be used at that level. The default is the
alternative that has the highest frequency.

If **vce(bootstrap)** or **vce(jackknife)** is specified, you must specify
the base alternative for each level that has a *byaltvarlist* or if the
constants will be estimated. Doing so ensures that the same model is
fit with each call to **nlogit**.

**estconst** applies to all the level equations except the bottom-level
equation. Specifying **estconst** requests that constants for each
alternative (except the base alternative) be estimated. By default,
no constant is estimated at these levels. Constants can be estimated
in only one level of the tree hierarchy. If you specify **estconst** for
one of the level equations, you must specify **noconstant** for the
bottom-level equation.

__Options for nlogit__

+-------+
----+ Model +------------------------------------------------------------

**case(***varname***)** specifies the variable that identifies each case. **case()**
is required.

**base(***#*|*lbl***)** can be specified in each level equation where it identifies
the base alternative to be used at that level. The default is the
alternative that has the highest frequency.

If **vce(bootstrap)** or **vce(jackknife)** is specified, you must specify
the base alternative for each level that has a *byaltvarlist* or if the
constants will be estimated. Doing so ensures that the same model is
fit with each call to **nlogit**.

**noconstant** applies only to the equation defining the bottom level of the
hierarchy. By default, constants are estimated for each alternative
of *altvar*, less the base alternative. To suppress the constant terms
for this level, specify **noconstant**. If you do not specify
**noconstant**, you cannot specify **estconst** for the higher-level
equations.

**nonnormalized** requests a nonnormalized parameterization of the model that
does not scale the inclusive values by the degree of dissimilarity of
the alternatives within each nest. Use this option to replicate
results from older versions of Stata. The default is to use the
RUM-consistent parameterization.

**altwise** specifies that alternativewise deletion be used when marking out
observations because of missing values in your variables. The
default is to use casewise deletion. This option does not apply to
observations that are marked out by the **if** or **in** qualifier or the **by**
prefix.

**constraints(***constraints***)**; see **[R] estimation options**.

The inclusive-valued/dissimilarity parameters are parameterized as **ml**
ancillary parameters. They are labeled as [*alternative*_tau]_const,
where *alternative* is one of the alternatives defining a branch in the
tree. To constrain the inclusive-valued/dissimilarity parameter for
alternative **a1** to be, say, equal to alternative **a2**, you would use the
following syntax:

**. constraint 1 [a1_tau]_cons = [a2_tau]_cons**

**. nlogit ... , constraints(1)**

**collinear** prevents collinear variables from being dropped. Use this
option when you know that you have collinear variables and you are
applying **constraints()** to handle the rank reduction. See **[R]**
**estimation options** for details on using **collinear** with **constraints()**.

**nlogit** will not allow you to specify an independent variable in more
than one level equation. Specifying the **collinear** option will allow
execution to proceed in this case, but it is your responsibility to
ensure that the parameters are identified.

+-----------+
----+ SE/Robust +--------------------------------------------------------

**vce(***vcetype***)** specifies the type of standard error reported, which
includes types that are derived from asymptotic theory (**oim**), that
are robust to some kinds of misspecification (**robust**), that allow for
intragroup correlation (**cluster** *clustvar*), and that use bootstrap or
jackknife methods (**bootstrap**, **jackknife**); see **[R] ***vce_option*.

If **vce(robust)** or **vce(cluster** *clustvar***)** is specified, the
likelihood-ratio test for the independence of irrelevant alternatives
(IIA) is not computed.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**; see **[R] estimation options**.

**notree** specifies that the tree structure of the nested logit model not be
displayed. See also **nolabel** and **nobranches** for when **notree** is not
specified.

**nocnsreport**; see **[R] estimation options**.

*display_options*: **noci**, __nopv__**alues**, __noomit__**ted**, **vsquish**, __noempty__**cells**,
__base__**levels**, __allbase__**levels**, __nofvlab__**el**, **fvwrap(***#***)**, **fvwrapon(***style***)**,
**cformat(***%fmt***)**, **pformat(%***fmt***)**, **sformat(%***fmt***)**, and **nolstretch**; see **[R]**
**estimation options**.

+--------------+
----+ Maximization +-----------------------------------------------------

*maximize_options*: __dif__**ficult**, __tech__**nique(***algorithm_spec***)**, __iter__**ate(***#***)**,
[__no__]__lo__**g**, __tr__**ace**, __grad__**ient**, **showstep**, __hess__**ian**, __showtol__**erance**,
__tol__**erance(***#***)**, __ltol__**erance(***#***)**, __nrtol__**erance(***#***)**, __nonrtol__**erance**, and
**from(***init_specs***)**; see **[R] maximize**. These options are seldom used.
The **technique(bhhh)** option is not allowed.

__Specification and options for nlogitgen__

*newaltvar* and *altvar* are variables identifying alternatives at each level
of the hierarchy.

*label* defines a label to associate with the branch. If no label is
given, a numeric value is used.

*alternative* specifies an alternative, of *altvar* specified in the syntax,
to be included in the branch. It is either a numeric value or the
label associated with that value. An example of **nlogitgen** is

**. nlogitgen type = restaurant(fast: 1 | 2,**
**family: CafeEccell | LosNortenos | WingsNmore, fancy: 6 | 7)**

**nolog** suppresses the display of the iteration log.

__Specification and options for nlogittree__

+------+
----+ Main +-------------------------------------------------------------

*altvarlist* is a list of alternative variables that define the tree
hierarchy. The first variable must define bottom-level alternatives,
and the order continues to the variable defining the top-level
alternatives.

**choice(***depvar***)** defines the choice indicator variable and forces
**nlogittree** to compute and display choice frequencies for each
bottom-level alternative.

**case(***varname***)** specifies the variable that identifies each case. When
both **case()** and **choice()** are specified, **nlogittree** executes
diagnostics on the tree structure and identifies observations that
will cause **nlogit** to terminate execution or drop observations.

**generate(***newvar***)** generates a new indicator variable, *newvar*, that is
equal to 1 for invalid observations. This option requires that both
**choice()** and **case()** are also specified.

**nolabel** forces **nlogittree** to suppress value labels in tree-structure
output.

**nobranches** forces **nlogittree** to suppress drawing branches in the
tree-structure output.

__Remark on degenerate branches__

Degenerate nests occur when there is only one alternative in a branch of
the tree hierarchy. The associated dissimilarity parameter of the RUM
model is not defined. The inclusive-valued parameter of the
nonnormalized model will be identifiable if there are
alternative-specific variables specified in equation 1 of the model
specification (the *indepvars* in the model syntax). Numerically you can
skirt the issue of nonidentifiable/undefined parameters by setting
constraints on them. For the RUM model constraint, set the dissimilarity
parameter to 1. See constraints for details on setting constraints on
the dissimilarity parameters.

__Examples__

Setup
**. webuse restaurant**

Generate a new categorical variable named **type** that identifies the
first-level set of alternatives based on the variable named **restaurant**
**. nlogitgen type = restaurant(fast: Freebirds | MamasPizza, family:**
**CafeEccell | LosNortenos | WingsNmore, fancy: Christophers |**
**MadCows)**

Examine the tree structure
**. nlogittree restaurant type, choice(chosen) case(family_id)**

Perform nested logit regression
**. nlogit chosen cost distance rating || type: income kids,**
**base(family) || restaurant:, noconst case(family_id)**

__Stored results__

**nlogit** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(N_case)** number of cases
**e(k_eq)** number of equations in **e(b)**
**e(k_eq_model)** number of equations in overall model test
**e(k_alt)** number of alternatives for bottom level
**e(k_alt***j***)** number of alternatives for *j*th level
**e(k_indvars)** number of independent variables
**e(k_ind2vars)** number of by-alternative variables for bottom
level
**e(k_ind2vars***j***)** number of by-alternative variables for *j*th
level
**e(df_m)** model degrees of freedom
**e(df_c)** **clogit** model degrees of freedom
**e(ll)** log likelihood
**e(ll_c)** **clogit** model log likelihood
**e(N_clust)** number of clusters
**e(chi2)** chi-squared
**e(chi2_c)** likelihood-ratio test for IIA
**e(p)** p-value for model Wald test
**e(p_c)** p-value for IIA test
**e(i_base)** base index for bottom level
**e(i_base***j***)** base index for *j*th level
**e(levels)** number of levels
**e(alt_min)** minimum number of alternatives
**e(alt_avg)** average number of alternatives
**e(alt_max)** maximum number of alternatives
**e(const)** constant indicator for bottom level
**e(const***j***)** constant indicator for *j*th level
**e(rum)** **1** if RUM model, **0** otherwise
**e(rank)** rank of **e(V)**
**e(ic)** number of iterations
**e(rc)** return code
**e(converged)** **1** if converged, **0** otherwise

Macros
**e(cmd)** **nlogit**
**e(cmdline)** command as typed
**e(depvar)** name of dependent variable
**e(indvars)** name of independent variables
**e(ind2vars)** by-alternative variables for bottom level
**e(ind2vars***j***)** by-alternative variables for *j*th level
**e(case)** variable defining cases
**e(altvar)** alternative variable for bottom level
**e(altvar***j***)** alternative variable for *j*th level
**e(alteqs)** equation names for bottom level
**e(alteqs***j***)** equation names for *j*th level
**e(alt***i***)** *i*th alternative for bottom level
**e(alt***j***_***i***)** *i*th alternative for *j*th level
**e(wtype)** weight type
**e(wexp)** weight expression
**e(title)** title in estimation output
**e(clustvar)** name of cluster variable
**e(chi2type)** **Wald**, type of model chi-squared test
**e(vce)** *vcetype* specified in **vce()**
**e(vcetype)** title used to label Std. Err.
**e(opt)** type of optimization
**e(which)** **max** or **min**; whether optimizer is to perform
maximization or minimization
**e(ml_method)** type of **ml** method
**e(user)** name of likelihood-evaluator program
**e(technique)** maximization technique
**e(datasignature)** the checksum
**e(datasignaturevars)** variables used in calculation of checksum
**e(properties)** **b V**
**e(estat_cmd)** program used to implement **estat**
**e(predict)** program used to implement **predict**
**e(marginsnotok)** predictions disallowed by **margins**
**e(asbalanced)** factor variables **fvset** as **asbalanced**
**e(asobserved)** factor variables **fvset** as **asobserved**

Matrices
**e(b)** coefficient vector
**e(Cns)** constraints matrix
**e(k_altern)** number of alternatives at each level
**e(k_branch***j***)** number of branches at each alternative of *j*th
level
**e(stats)** alternative statistics for bottom level
**e(stats***j***)** alternative statistics for *j*th level
**e(altidx***j***)** alternative indices for *j*th level
**e(alt_ind2vars)** indicators for bottom level estimated
by-alternative variable -- **e(k_alt)** x
**e(k_ind2vars)**
**e(alt_ind2vars***j***)** indicators for *j*th level estimated
by-alternative variable -- **e(k_alt***j***)** x
**e(k_ind2vars***j***)**
**e(ilog)** iteration log (up to 20 iterations)
**e(gradient)** gradient vector
**e(V)** variance-covariance matrix of the estimators
**e(V_modelbased)** model-based variance

Functions
**e(sample)** marks estimation sample