**[ME] meglm** -- Multilevel mixed-effects generalized linear model

__Syntax__

**meglm** *depvar* *fe_equation* [**||** *re_equation*] [**||** *re_equation* ...] [**,**
*options*]

where the syntax of *fe_equation* is

[*indepvars*] [*if*] [*in*] [*weight*] [**,** *fe_options*]

and the syntax of *re_equation* is one of the following:

for random coefficients and intercepts

*levelvar***:** [*varlist*] [**,** *re_options*]

for random effects among the values of a factor variable

*levelvar***:** **R.***varname*

*levelvar* is a variable identifying the group structure for the random
effects at that level or is **_all** representing one group comprising all
observations.

*fe_options* Description
-------------------------------------------------------------------------
Model
__nocons__**tant** suppress constant term from the fixed-effects
equation
__exp__**osure(***varname_e***)** include ln(*varname_e*) in model with
coefficient constrained to 1
__off__**set(***varname_o***)** include *varname_o* in model with coefficient
constrained to 1
**asis** retain perfect predictor variables
-------------------------------------------------------------------------

*re_options* Description
-------------------------------------------------------------------------
Model
__cov__**ariance(***vartype***)** variance-covariance structure of the random
effects
__nocons__**tant** suppress constant term from the random-effects
equation
__fw__**eight(***varname***)** frequency weights at higher levels
__iw__**eight(***varname***)** importance weights at higher levels
__pw__**eight(***varname***)** sampling weights at higher levels
-------------------------------------------------------------------------

*options* Description
-------------------------------------------------------------------------
Model
__f__**amily(***family***)** distribution of *depvar*; default is
**family(gaussian)**
__l__**ink(***link***)** link function; default varies per family
__const__**raints(***constraints***)** apply specified linear constraints
__col__**linear** keep collinear variables

SE/Robust
**vce(***vcetype***)** *vcetype* may be **oim**, __r__**obust**, or __cl__**uster**
*clustvar*

Reporting
__le__**vel(***#***)** set confidence level; default is **level(95)**
**eform** report exponentiated fixed-effects
coefficients
**irr** report fixed-effects coefficients as
incidence-rate ratios
**or** report fixed-effects coefficients as odds
ratios
__nocnsr__**eport** do not display constraints
__notab__**le** suppress coefficient table
__nohead__**er** suppress output header
__nogr__**oup** suppress table summarizing groups
*display_options* control columns and column formats, row
spacing, line width, display of omitted
variables and base and empty cells, and
factor-variable labeling

Integration
__intm__**ethod(***intmethod***)** integration method
__intp__**oints(***#***)** set the number of integration (quadrature)
points for all levels; default is
**intpoints(7)**

Maximization
*maximize_options* control the maximization process; seldom used

__startv__**alues(***svmethod***)** method for obtaining starting values
__startg__**rid**[**(***gridspec***)**] perform a grid search to improve starting
values
__noest__**imate** do not fit the model; show starting values
instead
**dnumerical** use numerical derivative techniques
__coefl__**egend** display legend instead of statistics
-------------------------------------------------------------------------

*vartype* Description
-------------------------------------------------------------------------
__ind__**ependent** one unique variance parameter per random
effect, all covariances 0; the default
unless the **R.** notation is used
__exc__**hangeable** equal variances for random effects, and one
common pairwise covariance
__id__**entity** equal variances for random effects, all
covariances 0; the default if the **R.**
notation is used
__un__**structured** all variances and covariances to be distinctly
estimated
__fix__**ed(***matname***)** user-selected variances and covariances
constrained to specified values; the
remaining variances and covariances
unrestricted
__pat__**tern(***matname***)** user-selected variances and covariances
constrained to be equal; the remaining
variances and covariances unrestricted
-------------------------------------------------------------------------

*family* Description
-------------------------------------------------------------------------
__gau__**ssian** Gaussian (normal); the default
__be__**rnoulli** Bernoulli
__bi__**nomial** [*#*|*varname*] binomial; default number of binomial trials is
1
__gam__**ma** gamma
__nb__**inomial** [**mean**|__cons__**tant**] negative binomial; default dispersion is **mean**
__o__**rdinal** ordinal
__p__**oisson** Poisson
-------------------------------------------------------------------------

*link* Description
-------------------------------------------------------------------------
__iden__**tity** identity
**log** log
**logit** logit
__prob__**it** probit
__clog__**log** complementary log-log
-------------------------------------------------------------------------

*intmethod* Description
-------------------------------------------------------------------------
__mv__**aghermite** mean-variance adaptive Gauss-Hermite
quadrature; the default unless a crossed
random-effects model is fit
__mc__**aghermite** mode-curvature adaptive Gauss-Hermite
quadrature
__gh__**ermite** nonadaptive Gauss-Hermite quadrature
__lap__**lace** Laplacian approximation; the default for
crossed random-effects models
-------------------------------------------------------------------------

*indepvars* may contain factor variables; see fvvarlist.
*depvar*, *indepvars*, and *varlist* may contain time-series operators; see
tsvarlist.
**bayes**, **by**, and **svy** are allowed; see prefix. For more details, see
**[BAYES] bayes: meglm**.
**vce()** and weights are not allowed with the **svy** prefix.
**fweight**s, **iweight**s, and **pweight**s are allowed; see weight. Only one type
of weight may be specified. Weights are not supported under the
Laplacian approximation or for crossed models.
**startvalues()**, **startgrid**, **noestimate**, **dnumerical**, and **coeflegend** do not
appear in the dialog box.
See **[ME] meglm postestimation** for features available after estimation.

__Menu__

**Statistics > Multilevel mixed-effects models >** **Generalized linear model**
**(GLM)**

__Description__

**meglm** fits multilevel mixed-effects generalized linear models. **meglm**
allows a variety of distributions for the response conditional on
normally distributed random effects.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**noconstant** suppresses the constant (intercept) term and may be specified
for the fixed-effects equation and for any of or all the
random-effects equations.

**exposure(***varname_e***)** specifies a variable that reflects the amount of
exposure over which the *depvar* events were observed for each
observation; ln(*varname_e*) is included in the fixed-effects portion
of the model with coefficient constrained to be 1.

**offset(***varname_o***)** specifies that *varname_o* be included in the
fixed-effects portion of the model with the coefficient constrained
to be 1.

**asis** forces retention of perfect predictor variables and their
associated, perfectly predicted observations and may produce
instabilities in maximization; see **[R] probit**.

**covariance(***vartype***)** specifies the structure of the covariance matrix for
the random effects and may be specified for each random-effects
equation. *vartype* is one of the following: **independent**,
**exchangeable**, **identity**, **unstructured**, **fixed(***matname***)**, or
**pattern(***matname***)**.

**covariance(independent)** covariance structure allows for a distinct
variance for each random effect within a random-effects equation
and assumes that all covariances are 0. The default is
**covariance(independent)** unless a crossed random-effects model is
fit, in which case the default is **covariance(identity)**.

**covariance(exchangeable)** structure specifies one common variance for
all random effects and one common pairwise covariance.

**covariance(identity)** is short for "multiple of the identity"; that
is, all variances are equal and all covariances are 0.

**covariance(unstructured)** allows for all variances and covariances to
be distinct. If an equation consists of p random-effects terms,
the unstructured covariance matrix will have p(p+1)/2 unique
parameters.

**covariance(fixed(***matname***))** and **covariance(pattern(***matname***))**
covariance structures provide a convenient way to impose
constraints on variances and covariances of random effects. Each
specification requires a *matname* that defines the restrictions
placed on variances and covariances. Only elements in the lower
triangle of *matname* are used, and row and column names of *matname*
are ignored. A missing value in *matname* means that a given
element is unrestricted. In a **fixed(***matname***)** covariance
structure, (co)variance (i,j) is constrained to equal the value
specified in the i,jth entry of *matname*. In a **pattern(***matname***)**
covariance structure, (co)variances (i,j) and (k,l) are
constrained to be equal if *matname*[i,j] = *matname*[k,l].

**fweight(***varname***)** specifies frequency weights at higher levels in a
multilevel model, whereas frequency weights at the first level (the
observation level) are specified in the usual manner, for example,
**[fw=***fwtvar1***]**. *varname* can be any valid Stata variable name, and you
can specify **fweight()** at levels two and higher of a multilevel model.
For example, in the two-level model

**. me**... *fixed_portion* **[fw = wt1]** **|| school:** ... **, fweight(wt2)**
...

the variable **wt1** would hold the first-level (the observation-level)
frequency weights, and **wt2** would hold the second-level (the
school-level) frequency weights.

**iweight(***varname***)** specifies importance weights at higher levels in a
multilevel model, whereas importance weights at the first level (the
observation level) are specified in the usual manner, for example,
**[iw=***iwtvar1***]**. *varname* can be any valid Stata variable name, and you
can specify **iweight()** at levels two and higher of a multilevel model.
For example, in the two-level model

**. me**... *fixed_portion* **[iw = wt1]** **|| school:** ... **, iweight(wt2)**
...

the variable **wt1** would hold the first-level (the observation-level)
importance weights, and **wt2** would hold the second-level (the
school-level) importance weights.

**pweight(***varname***)** specifies sampling weights at higher levels in a
multilevel model, whereas sampling weights at the first level (the
observation level) are specified in the usual manner, for example,
**[pw=***pwtvar1***]**. *varname* can be any valid Stata variable name, and you
can specify **pweight()** at levels two and higher of a multilevel model.
For example, in the two-level model

**. me**... *fixed_portion* **[pw = wt1]** **|| school:** ... **, pweight(wt2)**
...

variable **wt1** would hold the first-level (the observation-level)
sampling weights, and **wt2** would hold the second-level (the
school-level) sampling weights.

**family(***family***)** specifies the distribution of *depvar*; **family(gaussian)** is
the default.

**link(***link***)** specifies the link function; the default is the canonical link
for the **family()** specified except for the gamma and negative binomial
families.

If you specify both **family()** and **link()**, not all combinations make
sense. You may choose from the following combinations:

identity log logit probit cloglog
------------------------------------------------------------
Gaussian D x
Bernoulli D x x
binomial D x x
gamma D
negative binomial D
ordinal D x x
Poisson D
------------------------------------------------------------
D denotes the default.

**constraints(***constraints***)**, **collinear**; see **[R] estimation options**.

+-----------+
----+ SE/Robust +--------------------------------------------------------

**vce(***vcetype***)** specifies the type of standard error reported, which
includes types that are derived from asymptotic theory (**oim**), that
are robust to some kinds of misspecification (**robust**), and that allow
for intragroup correlation (**cluster** *clustvar*); see **[R] ***vce_option*.
If **vce(robust)** is specified, robust variances are clustered at the
highest level in the multilevel model.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**; see **[R] estimation options**.

**eform** reports exponentiated fixed-effects coefficients and corresponding
standard errors and confidence intervals. This option may be
specified either at estimation or upon replay.

**irr** reports estimated fixed-effects coefficients transformed to
incidence-rate ratios, that is, exp(b) rather than b. Standard
errors and confidence intervals are similarly transformed. This
option affects how results are displayed, not how they are estimated
or stored. **irr** may be specified either at estimation or upon replay.
This option is allowed for count models only.

**or** reports estimated fixed-effects coefficients transformed to odds
ratios, that is, exp(b) rather than b. Standard errors and
confidence intervals are similarly transformed. This option affects
how results are displayed, not how they are estimated. **or** may be
specified at estimation or upon replay. This option is allowed for
logistic models only.

**nocnsreport**; see **[R] estimation options**.

**notable** suppresses the estimation table, either at estimation or upon
replay.

**noheader** suppresses the output header, either at estimation or upon
replay.

**nogroup** suppresses the display of group summary information (number of
groups, average group size, minimum, and maximum) from the output
header.

*display_options*: **noci**, __nopv__**alues**, __noomit__**ted**, **vsquish**, __noempty__**cells**,
__base__**levels**, __allbase__**levels**, __nofvlab__**el**, **fvwrap(***#***)**, **fvwrapon(***style***)**,
**cformat(***%fmt***)**, **pformat(%***fmt***)**, **sformat(%***fmt***)**, and **nolstretch**; see **[R]**
**estimation options**.

+-------------+
----+ Integration +------------------------------------------------------

**intmethod(***intmethod***)** specifies the integration method to be used for the
random-effects model. **mvaghermite** performs mean-variance adaptive
Gauss-Hermite quadrature; **mcaghermite** performs mode-curvature
adaptive Gauss-Hermite quadrature; **ghermite** performs nonadaptive
Gauss-Hermite quadrature; and **laplace** performs the Laplacian
approximation, equivalent to mode-curvature adaptive Gaussian
quadrature with one integration point.

The default integration method is **mvaghermite** unless a crossed
random-effects model is fit, in which case the default integration
method is **laplace**. The Laplacian approximation has been known to
produce biased parameter estimates; however, the bias tends to be
more prominent in the estimates of the variance components rather
than in the estimates of the fixed effects.

For crossed random-effects models, estimation with more than one
quadrature point may be prohibitively intensive even for a small
number of levels. For this reason, the integration method defaults
to the Laplacian approximation. You may override this behavior by
specifying a different integration method.

**intpoints(***#***)** sets the number of integration points for quadrature. The
default is **intpoints(7)**, which means that seven quadrature points are
used for each level of random effects. This option is not allowed
with **intmethod(laplace)**.

The more integration points, the more accurate the approximation to
the log likelihood. However, computation time increases as a
function of the number of quadrature points raised to a power
equaling the dimension of the random-effects specification. In
crossed random-effects models and in models with many levels or many
random coefficients, this increase can be substantial.

+--------------+
----+ Maximization +-----------------------------------------------------

*maximize_options*: __dif__**ficult**, __tech__**nique(***algorithm_spec***)**, __iter__**ate(***#***)**,
[__no__]__lo__**g**, __tr__**ace**, __grad__**ient**, **showstep**, __hess__**ian**, __showtol__**erance**,
__tol__**erance(***#***)**, __ltol__**erance(***#***)**, __nrtol__**erance(***#***)**, __nonrtol__**erance**, and
**from(***init_specs***)**; see **[R] maximize**. Those that require special
mention for **meglm** are listed below.

**from()** accepts a properly labeled vector of initial values or a list
of coefficient names with values. A list of values is not allowed.

The following options are available with **meglm** but are not shown in the
dialog box:

**startvalues(***svmethod***)** specifies how starting values are to be computed.
Starting values specified in **from()** override the computed starting
values.

**startvalues(zero)** specifies that starting values be set to 0.

**startvalues(constantonly)** builds on **startvalues(zero)** by fitting a
constant-only model to obtain estimates of the intercept and
auxiliary parameters, and it substitutes 1 for the variances of
random effects.

**startvalues(fixedonly**[**,** __iter__**ate(***#***)**]**)** builds on
**startvalues(constantonly)** by fitting a full fixed-effects model to
obtain estimates of coefficients along with intercept and auxiliary
parameters, and it continues to use 1 for the variances of random
effects. This is the default behavior. **iterate(***#***)** limits the number
of iterations for fitting the fixed-effects model.

**startvalues(iv**[**,** __iter__**ate(***#***)**]**)** builds on **startvalues(fixedonly)** by
using instrumental-variable methods with generalized residuals to
obtain variances of random effects. **iterate(***#***)** limits the number of
iterations for fitting the instrumental-variable model.

**startvalues(**__iter__**ate(***#***))** limits the number of iterations for fitting
the default model (fixed effects).

**startgrid**[**(***gridspec***)**] performs a grid search on variance components of
random effects to improve starting values. No grid search is
performed by default unless the starting values are found to be not
feasible, in which case **meglm** runs **startgrid()** to perform a "minimal"
search involving q^3 likelihood evaluations, where q is the number of
random effects. Sometimes this resolves the problem. Usually,
however, there is no problem and **startgrid()** is not run by default.
There can be benefits from running **startgrid()** to get better starting
values even when starting values are feasible.

**startgrid()** is a brute-force approach that tries various values for
variances and covariances and chooses the ones that work best. You
may already be using a default form of **startgrid()** without knowing
it. If you see **meglm** displaying Grid node 1, Grid node 2, ...
following Grid node 0 in the iteration log, that is **meglm** doing a
default search because the original starting values were not
feasible. The default form tries 0.1, 1, and 10 for all variances of
all random effects.

**startgrid(***numlist***)** specifies values to try for variances of random
effects.

**startgrid(***covspec***)** specifies the particular variances and covariances
in which grid searches are to be performed. *covspec* is *name***[***level***]**
for variances and *name1***[***level***]****name2***[***level***]** for covariances. For
example, the variance of the random intercept at level **id** is
specified as **_cons[id]**, and the variance of the random slope on
variable **week** at the same level is specified as **week[id]**. The
residual variance for the linear mixed-effects model is specified as
**e.***depvar*, where *depvar* is the name of the dependent variable. The
covariance between the random slope and the random intercept above is
specified as **_cons[id]*week[id]**.

**startgrid(***numlist covspec***)** combines the two syntaxes. You may also
specify **startgrid()** multiple times so that you can search the
different ranges for different variances and covariances.

**noestimate** specifies that the model is not to be fit. Instead, starting
values are to be shown (as modified by the above options if
modifications were made), and they are to be shown using the
**coeflegend** style of output.

**dnumerical** specifies that during optimization, the gradient vector and
Hessian matrix be computed using numerical techniques instead of
analytical formulas. By default, analytical formulas for computing
the gradient and Hessian are used for all integration methods except
**intmethod(laplace)**.

**coeflegend**; see **[R] estimation options**.

__Remarks__

Remarks are presented under the following headings:

Remarks on distributional families
Remarks on using sampling weights

__Remarks on distributional families__

Some combinations of families and links are so common that we implemented
them as separate commands in terms of **meglm**.

Command **meglm** equivalent
------------------------------------------------------------
**melogit** **family(bernoulli) link(logit)**
**meprobit** **family(bernoulli) link(probit)**
**mecloglog** **family(bernoulli) link(cloglog)**
**meologit** **family(ordinal) link(logit)**
**meoprobit** **family(ordinal) link(probit)**
**mepoisson** **family(poisson) link(log)**
**menbreg** **family(nbinomial) link(log)**
------------------------------------------------------------

When no family-link combination is specified, **meglm** defaults to a
Gaussian family with an identity link. Thus **meglm** can be used to fit
linear mixed-effects models; however, for those models we recommend using
the more specialized **mixed**, which, in addition to **meglm** capabilities,
allows for modeling of the structure of the residual errors; see **[ME]**
**mixed** for details.

__Remarks on using sampling weights__

Sampling weights are treated differently in multilevel models than they
are in standard models such as OLS regression. In a multilevel model,
observation-level weights are not indicative of overall inclusion.
Instead, they indicate inclusion conditional on the corresponding cluster
being included at the next highest-level of sampling.

For example, if you include only observation-level weights in a two-level
model, sampling with equal probabilities at level two is assumed, and
this may or may not be what you intended. If the sampling at level two
is weighted, then including only level-one weights can lead to biased
results even if weighting at level two has been incorporated into the
level-one weight variable. For example, it is a common practice to
multiply conditional weights from multiple levels into one overall
weight. By contrast, weighted multilevel analysis requires the component
weights from each level of sampling.

Even if you specify sampling weights at all model levels, the scale of
sampling weights at lower levels can affect your estimated parameters in
a multilevel model. That is, not only do the relative sizes of the
weights at lower levels matter, the scale of these weights matters also.

In general, exercise caution when using sampling weights; see *Survey data*
in *Remarks and examples* of **[ME] meglm** for more information.

__Examples__

---------------------------------------------------------------------------
Setup
**. webuse pig**

Mixed-effects GLM with the default Gaussian family and identity link
**. meglm weight week || id:**

---------------------------------------------------------------------------
Setup
**. webuse towerlondon**

Three-level model with bernoulli family and default logit link
**. meglm dtlm difficulty i.group || family: || subject:,**
**family(bernoulli)**

Same as above but with a probit link
**. meglm dtlm difficulty i.group || family: || subject:,**
**family(bernoulli) link(probit)**

---------------------------------------------------------------------------
Setup
**. use http://www.stata-press.com/data/mlmus3/schiz**
**. generate impso = imps**
**. recode impso -9=. 1/2.4=1 2.5/4.4=2 4.5/5.4=3 5.5/7=4**

Two-level GLM with ordinal family and default logit link
**. meglm impso week treatment || id:, family(ordinal)**

Same as above but with a probit link
**. meglm impso week treatment || id:, family(ordinal) link(probit)**

Same as above but with a cloglog link
**. meglm impso week treatment || id:, family(ordinal) link(cloglog)**

---------------------------------------------------------------------------
Setup
**. webuse melanoma, clear**

Three-level random-intercept Poisson model
**. meglm deaths uv, exposure(expected) || nation: || region:,**
**family(poisson)**

Same as above but a negative binomial model
**. meglm deaths uv, exposure(expected) || nation: || region:,**
**family(nbinomial)**

---------------------------------------------------------------------------
Setup
**. use http://www.stata-press.com/data/mlmus3/drvisits**
**. keep if numvisit > 0**

Two-level random-intercept and random-coefficient gamma model
**. meglm numvisit reform age married loginc || id: reform,**
**family(gamma)**

---------------------------------------------------------------------------

__Video example__

Tour of multilevel GLMs

__Stored results__

**meglm** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(k)** number of parameters
**e(k_dv)** number of dependent variables
**e(k_eq)** number of equations in **e(b)**
**e(k_eq_model)** number of equations in overall model test
**e(k_cat)** number of categories (with ordinal outcomes)
**e(k_f)** number of fixed-effects parameters
**e(k_r)** number of random-effects parameters
**e(k_rs)** number of variances
**e(k_rc)** number of covariances
**e(df_m)** model degrees of freedom
**e(ll)** log likelihood
**e(chi2)** chi-squared
**e(p)** p-value for model test
**e(ll_c)** log likelihood, comparison model
**e(chi2_c)** chi-squared, comparison test
**e(df_c)** degrees of freedom, comparison test
**e(p_c)** p-value for comparison test
**e(N_clust)** number of clusters
**e(rank)** rank of **e(V)**
**e(ic)** number of iterations
**e(rc)** return code
**e(converged)** **1** if converged, **0** otherwise

Macros
**e(cmd)** **gsem**
**e(cmd2)** **meglm**
**e(cmdline)** command as typed
**e(depvar)** name of dependent variable
**e(wtype)** weight type
**e(wexp)** weight expression (first-level weights)
**e(fweight***k***)** **fweight** variable for *k*th highest level, if
specified
**e(iweight***k***)** **iweight** variable for *k*th highest level, if
specified
**e(pweight***k***)** **pweight** variable for *k*th highest level, if
specified
**e(covariates)** list of covariates
**e(ivars)** grouping variables
**e(model)** name of marginal model
**e(title)** title in estimation output
**e(link)** link
**e(family)** family
**e(clustvar)** name of cluster variable
**e(offset)** offset
**e(binomial)** binomial number of trials (with binomial
models)
**e(dispersion)** **mean** or **constant** (with negative binomial
models)
**e(intmethod)** integration method
**e(n_quad)** number of integration points
**e(chi2type)** **Wald**; type of model chi-squared
**e(vce)** *vcetype* specified in **vce()**
**e(vcetype)** title used to label Std. Err.
**e(opt)** type of optimization
**e(which)** **max** or **min**; whether optimizer is to perform
maximization or minimization
**e(ml_method)** type of **ml** method
**e(user)** name of likelihood-evaluator program
**e(technique)** maximization technique
**e(datasignature)** the checksum
**e(datasignaturevars)** variables used in calculation of checksum
**e(properties)** **b V**
**e(estat_cmd)** program used to implement **estat**
**e(predict)** program used to implement **predict**
**e(marginsnotok)** predictions disallowed by **margins**
**e(marginswtype)** weight type for **margins**
**e(marginswexp)** weight expression for **margins**
**e(asbalanced)** factor variables **fvset** as **asbalanced**
**e(asobserved)** factor variables **fvset** as **asobserved**

Matrices
**e(b)** coefficient vector
**e(Cns)** constraints matrix
**e(cat)** category values (with ordinal outcomes)
**e(ilog)** iteration log (up to 20 iterations)
**e(gradient)** gradient vector
**e(N_g)** group counts
**e(g_min)** group-size minimums
**e(g_avg)** group-size averages
**e(g_max)** group-size maximums
**e(V)** variance-covariance matrix of the estimators
**e(V_modelbased)** model-based variance

Functions
**e(sample)** marks estimation sample