Stata 15 help for meglm

[ME] meglm -- Multilevel mixed-effects generalized linear model

Syntax

meglm depvar fe_equation [|| re_equation] [|| re_equation ...] [, options]

where the syntax of fe_equation is

[indepvars] [if] [in] [weight] [, fe_options]

and the syntax of re_equation is one of the following:

for random coefficients and intercepts

levelvar: [varlist] [, re_options]

for random effects among the values of a factor variable

levelvar: R.varname

levelvar is a variable identifying the group structure for the random effects at that level or is _all representing one group comprising all observations.

fe_options Description ------------------------------------------------------------------------- Model noconstant suppress constant term from the fixed-effects equation exposure(varname_e) include ln(varname_e) in model with coefficient constrained to 1 offset(varname_o) include varname_o in model with coefficient constrained to 1 asis retain perfect predictor variables -------------------------------------------------------------------------

re_options Description ------------------------------------------------------------------------- Model covariance(vartype) variance-covariance structure of the random effects noconstant suppress constant term from the random-effects equation fweight(varname) frequency weights at higher levels iweight(varname) importance weights at higher levels pweight(varname) sampling weights at higher levels -------------------------------------------------------------------------

options Description ------------------------------------------------------------------------- Model family(family) distribution of depvar; default is family(gaussian) link(link) link function; default varies per family constraints(constraints) apply specified linear constraints collinear keep collinear variables

SE/Robust vce(vcetype) vcetype may be oim, robust, or cluster clustvar

Reporting level(#) set confidence level; default is level(95) eform report exponentiated fixed-effects coefficients irr report fixed-effects coefficients as incidence-rate ratios or report fixed-effects coefficients as odds ratios nocnsreport do not display constraints notable suppress coefficient table noheader suppress output header nogroup suppress table summarizing groups display_options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling

Integration intmethod(intmethod) integration method intpoints(#) set the number of integration (quadrature) points for all levels; default is intpoints(7)

Maximization maximize_options control the maximization process; seldom used

startvalues(svmethod) method for obtaining starting values startgrid[(gridspec)] perform a grid search to improve starting values noestimate do not fit the model; show starting values instead dnumerical use numerical derivative techniques coeflegend display legend instead of statistics -------------------------------------------------------------------------

vartype Description ------------------------------------------------------------------------- independent one unique variance parameter per random effect, all covariances 0; the default unless the R. notation is used exchangeable equal variances for random effects, and one common pairwise covariance identity equal variances for random effects, all covariances 0; the default if the R. notation is used unstructured all variances and covariances to be distinctly estimated fixed(matname) user-selected variances and covariances constrained to specified values; the remaining variances and covariances unrestricted pattern(matname) user-selected variances and covariances constrained to be equal; the remaining variances and covariances unrestricted -------------------------------------------------------------------------

family Description ------------------------------------------------------------------------- gaussian Gaussian (normal); the default bernoulli Bernoulli binomial [#|varname] binomial; default number of binomial trials is 1 gamma gamma nbinomial [mean|constant] negative binomial; default dispersion is mean ordinal ordinal poisson Poisson -------------------------------------------------------------------------

link Description ------------------------------------------------------------------------- identity identity log log logit logit probit probit cloglog complementary log-log -------------------------------------------------------------------------

intmethod Description ------------------------------------------------------------------------- mvaghermite mean-variance adaptive Gauss-Hermite quadrature; the default unless a crossed random-effects model is fit mcaghermite mode-curvature adaptive Gauss-Hermite quadrature ghermite nonadaptive Gauss-Hermite quadrature laplace Laplacian approximation; the default for crossed random-effects models -------------------------------------------------------------------------

indepvars may contain factor variables; see fvvarlist. depvar, indepvars, and varlist may contain time-series operators; see tsvarlist. bayes, by, and svy are allowed; see prefix. For more details, see [BAYES] bayes: meglm. vce() and weights are not allowed with the svy prefix. fweights, iweights, and pweights are allowed; see weight. Only one type of weight may be specified. Weights are not supported under the Laplacian approximation or for crossed models. startvalues(), startgrid, noestimate, dnumerical, and coeflegend do not appear in the dialog box. See [ME] meglm postestimation for features available after estimation.

Menu

Statistics > Multilevel mixed-effects models > Generalized linear model (GLM)

Description

meglm fits multilevel mixed-effects generalized linear models. meglm allows a variety of distributions for the response conditional on normally distributed random effects.

Options

+-------+ ----+ Model +------------------------------------------------------------

noconstant suppresses the constant (intercept) term and may be specified for the fixed-effects equation and for any of or all the random-effects equations.

exposure(varname_e) specifies a variable that reflects the amount of exposure over which the depvar events were observed for each observation; ln(varname_e) is included in the fixed-effects portion of the model with coefficient constrained to be 1.

offset(varname_o) specifies that varname_o be included in the fixed-effects portion of the model with the coefficient constrained to be 1.

asis forces retention of perfect predictor variables and their associated, perfectly predicted observations and may produce instabilities in maximization; see [R] probit.

covariance(vartype) specifies the structure of the covariance matrix for the random effects and may be specified for each random-effects equation. vartype is one of the following: independent, exchangeable, identity, unstructured, fixed(matname), or pattern(matname).

covariance(independent) covariance structure allows for a distinct variance for each random effect within a random-effects equation and assumes that all covariances are 0. The default is covariance(independent) unless a crossed random-effects model is fit, in which case the default is covariance(identity).

covariance(exchangeable) structure specifies one common variance for all random effects and one common pairwise covariance.

covariance(identity) is short for "multiple of the identity"; that is, all variances are equal and all covariances are 0.

covariance(unstructured) allows for all variances and covariances to be distinct. If an equation consists of p random-effects terms, the unstructured covariance matrix will have p(p+1)/2 unique parameters.

covariance(fixed(matname)) and covariance(pattern(matname)) covariance structures provide a convenient way to impose constraints on variances and covariances of random effects. Each specification requires a matname that defines the restrictions placed on variances and covariances. Only elements in the lower triangle of matname are used, and row and column names of matname are ignored. A missing value in matname means that a given element is unrestricted. In a fixed(matname) covariance structure, (co)variance (i,j) is constrained to equal the value specified in the i,jth entry of matname. In a pattern(matname) covariance structure, (co)variances (i,j) and (k,l) are constrained to be equal if matname[i,j] = matname[k,l].

fweight(varname) specifies frequency weights at higher levels in a multilevel model, whereas frequency weights at the first level (the observation level) are specified in the usual manner, for example, [fw=fwtvar1]. varname can be any valid Stata variable name, and you can specify fweight() at levels two and higher of a multilevel model. For example, in the two-level model

. me... fixed_portion [fw = wt1] || school: ... , fweight(wt2) ...

the variable wt1 would hold the first-level (the observation-level) frequency weights, and wt2 would hold the second-level (the school-level) frequency weights.

iweight(varname) specifies importance weights at higher levels in a multilevel model, whereas importance weights at the first level (the observation level) are specified in the usual manner, for example, [iw=iwtvar1]. varname can be any valid Stata variable name, and you can specify iweight() at levels two and higher of a multilevel model. For example, in the two-level model

. me... fixed_portion [iw = wt1] || school: ... , iweight(wt2) ...

the variable wt1 would hold the first-level (the observation-level) importance weights, and wt2 would hold the second-level (the school-level) importance weights.

pweight(varname) specifies sampling weights at higher levels in a multilevel model, whereas sampling weights at the first level (the observation level) are specified in the usual manner, for example, [pw=pwtvar1]. varname can be any valid Stata variable name, and you can specify pweight() at levels two and higher of a multilevel model. For example, in the two-level model

. me... fixed_portion [pw = wt1] || school: ... , pweight(wt2) ...

variable wt1 would hold the first-level (the observation-level) sampling weights, and wt2 would hold the second-level (the school-level) sampling weights.

family(family) specifies the distribution of depvar; family(gaussian) is the default.

link(link) specifies the link function; the default is the canonical link for the family() specified except for the gamma and negative binomial families.

If you specify both family() and link(), not all combinations make sense. You may choose from the following combinations:

identity log logit probit cloglog ------------------------------------------------------------ Gaussian D x Bernoulli D x x binomial D x x gamma D negative binomial D ordinal D x x Poisson D ------------------------------------------------------------ D denotes the default.

constraints(constraints), collinear; see [R] estimation options.

+-----------+ ----+ SE/Robust +--------------------------------------------------------

vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory (oim), that are robust to some kinds of misspecification (robust), and that allow for intragroup correlation (cluster clustvar); see [R] vce_option. If vce(robust) is specified, robust variances are clustered at the highest level in the multilevel model.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options.

eform reports exponentiated fixed-effects coefficients and corresponding standard errors and confidence intervals. This option may be specified either at estimation or upon replay.

irr reports estimated fixed-effects coefficients transformed to incidence-rate ratios, that is, exp(b) rather than b. Standard errors and confidence intervals are similarly transformed. This option affects how results are displayed, not how they are estimated or stored. irr may be specified either at estimation or upon replay. This option is allowed for count models only.

or reports estimated fixed-effects coefficients transformed to odds ratios, that is, exp(b) rather than b. Standard errors and confidence intervals are similarly transformed. This option affects how results are displayed, not how they are estimated. or may be specified at estimation or upon replay. This option is allowed for logistic models only.

nocnsreport; see [R] estimation options.

notable suppresses the estimation table, either at estimation or upon replay.

noheader suppresses the output header, either at estimation or upon replay.

nogroup suppresses the display of group summary information (number of groups, average group size, minimum, and maximum) from the output header.

display_options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R] estimation options.

+-------------+ ----+ Integration +------------------------------------------------------

intmethod(intmethod) specifies the integration method to be used for the random-effects model. mvaghermite performs mean-variance adaptive Gauss-Hermite quadrature; mcaghermite performs mode-curvature adaptive Gauss-Hermite quadrature; ghermite performs nonadaptive Gauss-Hermite quadrature; and laplace performs the Laplacian approximation, equivalent to mode-curvature adaptive Gaussian quadrature with one integration point.

The default integration method is mvaghermite unless a crossed random-effects model is fit, in which case the default integration method is laplace. The Laplacian approximation has been known to produce biased parameter estimates; however, the bias tends to be more prominent in the estimates of the variance components rather than in the estimates of the fixed effects.

For crossed random-effects models, estimation with more than one quadrature point may be prohibitively intensive even for a small number of levels. For this reason, the integration method defaults to the Laplacian approximation. You may override this behavior by specifying a different integration method.

intpoints(#) sets the number of integration points for quadrature. The default is intpoints(7), which means that seven quadrature points are used for each level of random effects. This option is not allowed with intmethod(laplace).

The more integration points, the more accurate the approximation to the log likelihood. However, computation time increases as a function of the number of quadrature points raised to a power equaling the dimension of the random-effects specification. In crossed random-effects models and in models with many levels or many random coefficients, this increase can be substantial.

+--------------+ ----+ Maximization +-----------------------------------------------------

maximize_options: difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(init_specs); see [R] maximize. Those that require special mention for meglm are listed below.

from() accepts a properly labeled vector of initial values or a list of coefficient names with values. A list of values is not allowed.

The following options are available with meglm but are not shown in the dialog box:

startvalues(svmethod) specifies how starting values are to be computed. Starting values specified in from() override the computed starting values.

startvalues(zero) specifies that starting values be set to 0.

startvalues(constantonly) builds on startvalues(zero) by fitting a constant-only model to obtain estimates of the intercept and auxiliary parameters, and it substitutes 1 for the variances of random effects.

startvalues(fixedonly[, iterate(#)]) builds on startvalues(constantonly) by fitting a full fixed-effects model to obtain estimates of coefficients along with intercept and auxiliary parameters, and it continues to use 1 for the variances of random effects. This is the default behavior. iterate(#) limits the number of iterations for fitting the fixed-effects model.

startvalues(iv[, iterate(#)]) builds on startvalues(fixedonly) by using instrumental-variable methods with generalized residuals to obtain variances of random effects. iterate(#) limits the number of iterations for fitting the instrumental-variable model.

startvalues(iterate(#)) limits the number of iterations for fitting the default model (fixed effects).

startgrid[(gridspec)] performs a grid search on variance components of random effects to improve starting values. No grid search is performed by default unless the starting values are found to be not feasible, in which case meglm runs startgrid() to perform a "minimal" search involving q^3 likelihood evaluations, where q is the number of random effects. Sometimes this resolves the problem. Usually, however, there is no problem and startgrid() is not run by default. There can be benefits from running startgrid() to get better starting values even when starting values are feasible.

startgrid() is a brute-force approach that tries various values for variances and covariances and chooses the ones that work best. You may already be using a default form of startgrid() without knowing it. If you see meglm displaying Grid node 1, Grid node 2, ... following Grid node 0 in the iteration log, that is meglm doing a default search because the original starting values were not feasible. The default form tries 0.1, 1, and 10 for all variances of all random effects.

startgrid(numlist) specifies values to try for variances of random effects.

startgrid(covspec) specifies the particular variances and covariances in which grid searches are to be performed. covspec is name[level] for variances and name1[level]*name2[level] for covariances. For example, the variance of the random intercept at level id is specified as _cons[id], and the variance of the random slope on variable week at the same level is specified as week[id]. The residual variance for the linear mixed-effects model is specified as e.depvar, where depvar is the name of the dependent variable. The covariance between the random slope and the random intercept above is specified as _cons[id]*week[id].

startgrid(numlist covspec) combines the two syntaxes. You may also specify startgrid() multiple times so that you can search the different ranges for different variances and covariances.

noestimate specifies that the model is not to be fit. Instead, starting values are to be shown (as modified by the above options if modifications were made), and they are to be shown using the coeflegend style of output.

dnumerical specifies that during optimization, the gradient vector and Hessian matrix be computed using numerical techniques instead of analytical formulas. By default, analytical formulas for computing the gradient and Hessian are used for all integration methods except intmethod(laplace).

coeflegend; see [R] estimation options.

Remarks

Remarks are presented under the following headings:

Remarks on distributional families Remarks on using sampling weights

Remarks on distributional families

Some combinations of families and links are so common that we implemented them as separate commands in terms of meglm.

Command meglm equivalent ------------------------------------------------------------ melogit family(bernoulli) link(logit) meprobit family(bernoulli) link(probit) mecloglog family(bernoulli) link(cloglog) meologit family(ordinal) link(logit) meoprobit family(ordinal) link(probit) mepoisson family(poisson) link(log) menbreg family(nbinomial) link(log) ------------------------------------------------------------

When no family-link combination is specified, meglm defaults to a Gaussian family with an identity link. Thus meglm can be used to fit linear mixed-effects models; however, for those models we recommend using the more specialized mixed, which, in addition to meglm capabilities, allows for modeling of the structure of the residual errors; see [ME] mixed for details.

Remarks on using sampling weights

Sampling weights are treated differently in multilevel models than they are in standard models such as OLS regression. In a multilevel model, observation-level weights are not indicative of overall inclusion. Instead, they indicate inclusion conditional on the corresponding cluster being included at the next highest-level of sampling.

For example, if you include only observation-level weights in a two-level model, sampling with equal probabilities at level two is assumed, and this may or may not be what you intended. If the sampling at level two is weighted, then including only level-one weights can lead to biased results even if weighting at level two has been incorporated into the level-one weight variable. For example, it is a common practice to multiply conditional weights from multiple levels into one overall weight. By contrast, weighted multilevel analysis requires the component weights from each level of sampling.

Even if you specify sampling weights at all model levels, the scale of sampling weights at lower levels can affect your estimated parameters in a multilevel model. That is, not only do the relative sizes of the weights at lower levels matter, the scale of these weights matters also.

In general, exercise caution when using sampling weights; see Survey data in Remarks and examples of [ME] meglm for more information.

Examples

--------------------------------------------------------------------------- Setup . webuse pig

Mixed-effects GLM with the default Gaussian family and identity link . meglm weight week || id:

--------------------------------------------------------------------------- Setup . webuse towerlondon

Three-level model with bernoulli family and default logit link . meglm dtlm difficulty i.group || family: || subject:, family(bernoulli)

Same as above but with a probit link . meglm dtlm difficulty i.group || family: || subject:, family(bernoulli) link(probit)

--------------------------------------------------------------------------- Setup . use http://www.stata-press.com/data/mlmus3/schiz . generate impso = imps . recode impso -9=. 1/2.4=1 2.5/4.4=2 4.5/5.4=3 5.5/7=4

Two-level GLM with ordinal family and default logit link . meglm impso week treatment || id:, family(ordinal)

Same as above but with a probit link . meglm impso week treatment || id:, family(ordinal) link(probit)

Same as above but with a cloglog link . meglm impso week treatment || id:, family(ordinal) link(cloglog)

--------------------------------------------------------------------------- Setup . webuse melanoma, clear

Three-level random-intercept Poisson model . meglm deaths uv, exposure(expected) || nation: || region:, family(poisson)

Same as above but a negative binomial model . meglm deaths uv, exposure(expected) || nation: || region:, family(nbinomial)

--------------------------------------------------------------------------- Setup . use http://www.stata-press.com/data/mlmus3/drvisits . keep if numvisit > 0

Two-level random-intercept and random-coefficient gamma model . meglm numvisit reform age married loginc || id: reform, family(gamma)

---------------------------------------------------------------------------

Video example

Tour of multilevel GLMs

Stored results

meglm stores the following in e():

Scalars e(N) number of observations e(k) number of parameters e(k_dv) number of dependent variables e(k_eq) number of equations in e(b) e(k_eq_model) number of equations in overall model test e(k_cat) number of categories (with ordinal outcomes) e(k_f) number of fixed-effects parameters e(k_r) number of random-effects parameters e(k_rs) number of variances e(k_rc) number of covariances e(df_m) model degrees of freedom e(ll) log likelihood e(chi2) chi-squared e(p) p-value for model test e(ll_c) log likelihood, comparison model e(chi2_c) chi-squared, comparison test e(df_c) degrees of freedom, comparison test e(p_c) p-value for comparison test e(N_clust) number of clusters e(rank) rank of e(V) e(ic) number of iterations e(rc) return code e(converged) 1 if converged, 0 otherwise

Macros e(cmd) gsem e(cmd2) meglm e(cmdline) command as typed e(depvar) name of dependent variable e(wtype) weight type e(wexp) weight expression (first-level weights) e(fweightk) fweight variable for kth highest level, if specified e(iweightk) iweight variable for kth highest level, if specified e(pweightk) pweight variable for kth highest level, if specified e(covariates) list of covariates e(ivars) grouping variables e(model) name of marginal model e(title) title in estimation output e(link) link e(family) family e(clustvar) name of cluster variable e(offset) offset e(binomial) binomial number of trials (with binomial models) e(dispersion) mean or constant (with negative binomial models) e(intmethod) integration method e(n_quad) number of integration points e(chi2type) Wald; type of model chi-squared e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(opt) type of optimization e(which) max or min; whether optimizer is to perform maximization or minimization e(ml_method) type of ml method e(user) name of likelihood-evaluator program e(technique) maximization technique e(datasignature) the checksum e(datasignaturevars) variables used in calculation of checksum e(properties) b V e(estat_cmd) program used to implement estat e(predict) program used to implement predict e(marginsnotok) predictions disallowed by margins e(marginswtype) weight type for margins e(marginswexp) weight expression for margins e(asbalanced) factor variables fvset as asbalanced e(asobserved) factor variables fvset as asobserved

Matrices e(b) coefficient vector e(Cns) constraints matrix e(cat) category values (with ordinal outcomes) e(ilog) iteration log (up to 20 iterations) e(gradient) gradient vector e(N_g) group counts e(g_min) group-size minimums e(g_avg) group-size averages e(g_max) group-size maximums e(V) variance-covariance matrix of the estimators e(V_modelbased) model-based variance

Functions e(sample) marks estimation sample


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index