Stata 15 help for glm

[R] glm -- Generalized linear models

Syntax

glm depvar [indepvars] [if] [in] [weight] [, options]

options Description ------------------------------------------------------------------------- Model family(familyname) distribution of depvar; default is family(gaussian) link(linkname) link function; default is canonical link for family() specified

Model 2 noconstant suppress constant term exposure(varname) include ln(varname) in model with coefficient constrained to 1 offset(varname) include varname in model with coefficient constrained to 1 constraints(constraints) apply specified linear constraints collinear keep collinear variables asis retain perfect predictor variables mu(varname) use varname as the initial estimate for the mean of depvar init(varname) synonym for mu(varname)

SE/Robust vce(vcetype) vcetype may be oim, robust, cluster clustvar, eim, opg, bootstrap, jackknife, hac kernel, jackknife1, or unbiased vfactor(#) multiply variance matrix by scalar # disp(#) quasilikelihood multiplier scale(x2|dev|#) set the scale parameter

Reporting level(#) set confidence level; default is level(95) eform report exponentiated coefficients nocnsreport do not display constraints display_options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling

Maximization ml use maximum likelihood optimization; the default irls use iterated, reweighted least-squares optimization of the deviance maximize_options control the maximization process; seldom used fisher(#) use the Fisher scoring Hessian or expected information matrix (EIM) search search for good starting values

noheader suppress header table from above coefficient table notable suppress coefficient table nodisplay suppress the output; iteration log is still displayed coeflegend display legend instead of statistics -------------------------------------------------------------------------

familyname Description ------------------------------------------------------------------------- gaussian Gaussian (normal) igaussian inverse Gaussian binomial[varnameN|#N] Bernoulli/binomial poisson Poisson nbinomial[#k|ml] negative binomial gamma gamma -------------------------------------------------------------------------

linkname Description ------------------------------------------------------------------------- identity identity log log logit logit probit probit cloglog clog-log power # power opower # odds power nbinomial negative binomial loglog log-log logc log-complement -------------------------------------------------------------------------

indepvars may contain factor variables; see fvvarlist. depvar and indepvars may contain time-series operators; see tsvarlist. bayes, bootstrap, by, fmm, fp, jackknife, mfp, mi estimate, nestreg, rolling, statsby, stepwise, and svy are allowed; see prefix. For more details, see [BAYES] bayes: glm and [FMM] fmm: glm. vce(bootstrap), vce(jackknife), and vce(jackknife1) are not allowed with the mi estimate prefix. Weights are not allowed with the bootstrap prefix. aweights are not allowed with the jackknife prefix. vce(), vfactor(), disp(), scale(), irls, fisher(), noheader, notable, nodisplay, and weights are not allowed with the svy prefix. fweights, aweights, iweights, and pweights are allowed; see weight. noheader, notable, nodisplay, and coeflegend do not appear in the dialog box. See [R] glm postestimation for features available after estimation.

Menu

Statistics > Generalized linear models > Generalized linear models (GLM)

Description

glm fits generalized linear models. It can fit models by using either IRLS (maximum quasilikelihood) or Newton-Raphson (maximum likelihood) optimization, which is the default.

See [U] 26 Overview of Stata estimation commands for a description of all of Stata's estimation commands, several of which fit models that can also be fit using glm.

Options

+-------+ ----+ Model +------------------------------------------------------------

family(familyname) specifies the distribution of depvar; family(gaussian) is the default.

link(linkname) specifies the link function; the default is the canonical link for the family() specified (except for family(nbinomial)).

+---------+ ----+ Model 2 +----------------------------------------------------------

noconstant, exposure(varname), offset(varname), constraints(constraints), collinear; see [R] estimation options. constraints(constraints) and collinear are not allowed with irls.

asis forces retention of perfect predictor variables and their associated, perfectly predicted observations and may produce instabilities in maximization; see [R] probit. This option is only allowed with option family(binomial) with a denominator of 1.

mu(varname) specifies varname as the initial estimate for the mean of depvar. This option can be useful with models that experience convergence difficulties, such as family(binomial) models with power or odds-power links. init(varname) is a synonym.

+-----------+ ----+ SE/Robust +--------------------------------------------------------

vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory (oim, opg), that are robust to some kinds of misspecification (robust), that allow for intragroup correlation (cluster clustvar), and that use bootstrap or jackknife methods (bootstrap, jackknife); see [R] vce_option.

In addition to the standard vcetypes, glm allows the following alternatives:

vce(eim) specifies that the EIM estimate of variance be used.

vce(jackknife1) specifies that the one-step jackknife estimate of variance be used.

vce(hac kernel [#]) specifies that a heteroskedasticity- and autocorrelation-consistent (HAC) variance estimate be used. HAC refers to the general form for combining weighted matrices to form the variance estimate. There are three kernels built into glm. kernel is a user-written program or one of

nwest | gallant | anderson

# specifies the number of lags. If # is not specified, N - 2 is assumed. If you wish to specify vce(hac ... ), you must tsset your data before calling glm.

vce(unbiased) specifies that the unbiased sandwich estimate of variance be used.

vfactor(#) specifies a scalar by which to multiply the resulting variance matrix. This option allows you to match output with other packages, which may apply degrees of freedom or other small-sample corrections to estimates of variance.

disp(#) multiplies the variance of depvar by # and divides the deviance by #. The resulting distributions are members of the quasilikelihood family.

scale(x2|dev|#) overrides the default scale parameter. This option is allowed only with Hessian (information matrix) variance estimates.

By default, scale(1) is assumed for the discrete distributions (binomial, Poisson, and negative binomial), and scale(x2) is assumed for the continuous distributions (Gaussian, gamma, and inverse Gaussian).

scale(x2) specifies that the scale parameter be set to the Pearson chi-squared (or generalized chi-squared) statistic divided by the residual degrees of freedom, which is recommended by McCullagh and Nelder (1989) as a good general choice for continuous distributions.

scale(dev) sets the scale parameter to the deviance divided by the residual degrees of freedom. This option provides an alternative to scale(x2) for continuous distributions and overdispersed or underdispersed discrete distributions.

scale(#) sets the scale parameter to #. For example, using scale(1) in family(gamma) models results in exponential-errors regression. Additional use of link(log) rather than the default link(power -1) for family(gamma) essentially reproduces Stata's streg, dist(exp) nohr command (see [ST] streg) if all the observations are uncensored.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options.

eform displays the exponentiated coefficients and corresponding standard errors and confidence intervals. For family(binomial) link(logit) (that is, logistic regression), exponentiation results are odds ratios; for family(nbinomial) link(log) (that is, negative binomial regression) and for family(poisson) link(log) (that is, Poisson regression), exponentiated coefficients are incidence-rate ratios.

nocnsreport; see [R] estimation options.

display_options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R] estimation options.

+--------------+ ----+ Maximization +-----------------------------------------------------

ml requests that optimization be carried out using Stata's ml commands and is the default.

irls requests iterated, reweighted least-squares (IRLS) optimization of the deviance instead of Newton-Raphson optimization of the log likelihood. If the irls option is not specified, the optimization is carried out using Stata's ml commands, in which case all options of ml maximize are also available.

maximize_options: difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance, and from(init_specs); see [R] maximize. These options are seldom used.

Setting the optimization type to technique(bhhh) resets the default vcetype to vce(opg).

fisher(#) specifies the number of Newton-Raphson steps that should use the Fisher scoring Hessian or EIM before switching to the observed information matrix (OIM). This option is useful only for Newton-Raphson optimization (and not when using irls).

search specifies that the command search for good starting values. This option is useful only for Newton-Raphson optimization (and not when using irls).

The following options are available with glm but are not shown in the dialog box:

noheader suppresses the header information from the output. The coefficient table is still displayed.

notable suppresses the table of coefficients from the output. The header information is still displayed.

nodisplay suppresses the output. The iteration log is still displayed.

coeflegend; see [R] estimation options.

Remarks

Although glm can be used to fit linear regression (and, in fact, does so by default), this should be viewed as an instructional feature; regress produces such estimates more quickly, and many postestimation commands are available to explore the adequacy of the fit; see [R] regress and [R] regress postestimation.

In any case, you should specify the link function by using the link() option and specify the distributional family by using family(). The available link functions are

Link function glm option ---------------------------------------- identity link(identity) log link(log) logit link(logit) probit link(probit) complementary log-log link(cloglog) odds power link(opower #) power link(power #) negative binomial link(nbinomial) log-log link(loglog) log-complement link(logc)

The available distributional families are

Family glm option ---------------------------------------- Gaussian(normal) family(gaussian) inverse Gaussian family(igaussian) Bernoulli/binomial family(binomial) Poisson family(poisson) negative binomial family(nbinomial) gamma family(gamma)

You do not have to specify both family() and link(); the default link() is the canonical link for the specified family() (except for nbinomial):

Family Default link -------------------------------------- family(gaussian) link(identity) family(igaussian) link(power -2) family(binomial) link(logit) family(poisson) link(log) family(nbinomial) link(log) family(gamma) link(power -1)

If you specify both family() and link(), not all combinations make sense. You may choose from the following combinations:

| id log logit probit clog pow opower nbinomial loglog logc ----------+------------------------------------------------------------------- Gaussian | x x x inv. Gau. | x x x binomial | x x x x x x x x x Poisson | x x x neg. bin. | x x x x gamma | x x x

Examples

--------------------------------------------------------------------------- Setup . webuse lbw

Generalized linear model with Bernoulli family and default logit link . glm low age lwt i.race smoke ptl ht ui, family(binomial)

Replay results and report exponentiated coefficients . glm, eform

--------------------------------------------------------------------------- Setup . webuse ldose

Generalized linear model with binomial family and default logit link . glm r ldose, family(binomial n)

Generalized linear model with binomial family and clog-log link . glm r ldose, family(binomial n) link(cloglog)

--------------------------------------------------------------------------- Setup . webuse beetle

Generalized linear model with binomial family and clog-log link . glm r i.beetle ldose, family(binomial n) link(cloglog)

Replay results with 99% confidence intervals . glm, level(99) ---------------------------------------------------------------------------

Stored results

glm, ml stores the following in e():

Scalars e(N) number of observations e(k) number of parameters e(k_eq) number of equations in e(b) e(k_eq_model) number of equations in overall model test e(k_dv) number of dependent variables e(df_m) model degrees of freedom e(df) residual degrees of freedom e(phi) scale parameter e(aic) model AIC e(bic) model BIC e(ll) log likelihood, if NR e(N_clust) number of clusters e(chi2) chi-squared e(p) p-value for model test e(deviance) deviance e(deviance_s) scaled deviance e(deviance_p) Pearson deviance e(deviance_ps) scaled Pearson deviance e(dispers) dispersion e(dispers_s) scaled dispersion e(dispers_p) Pearson dispersion e(dispers_ps) scaled Pearson dispersion e(nbml) 1 if negative binomial parameter estimated via ML, 0 otherwise e(vf) factor set by vfactor(), 1 if not set e(power) power set by link(power #) or link(opower #) e(rank) rank of e(V) e(ic) number of iterations e(rc) return code e(converged) 1 if converged, 0 otherwise

Macros e(cmd) glm e(cmdline) command as typed e(depvar) name of dependent variable e(varfunc) program to calculate variance function e(varfunct) variance title e(varfuncf) variance function e(link) program to calculate link function e(linkt) link title e(linkf) link function e(m) number of binomial trials e(wtype) weight type e(wexp) weight expression e(title) title in estimation output e(clustvar) name of cluster variable e(offset) linear offset variable e(chi2type) Wald; type of model chi-squared test e(cons) noconstant, if specified e(hac_kernel) HAC kernel e(hac_lag) HAC lag e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(opt) ml or irls e(opt1) optimization title, line 1 e(opt2) optimization title, line 2 e(which) max or min; whether optimizer is to perform maximization or minimization e(ml_method) type of ml method e(user) name of likelihood-evaluator program e(technique) maximization technique e(properties) b V e(predict) program used to implement predict e(marginsok) predictions allowed by margins e(marginsnotok) predictions disallowed by margins e(asbalanced) factor variables fvset as asbalanced e(asobserved) factor variables fvset as asobserved

Matrices e(b) coefficient vector e(Cns) constraints matrix e(ilog) iteration log (up to 20 iterations) e(gradient) gradient vector e(V) variance-covariance matrix of the estimators e(V_modelbased) model-based variance

Functions e(sample) marks estimation sample

glm, irls stores the following in e():

Scalars e(N) number of observations e(k) number of parameters e(k_eq_model) number of equations in overall model test e(df_m) model degrees of freedom e(df) residual degrees of freedom e(phi) scale parameter e(disp) dispersion parameter e(bic) model BIC e(N_clust) number of clusters e(deviance) deviance e(deviance_s) scaled deviance e(deviance_p) Pearson deviance e(deviance_ps) scaled Pearson deviance e(dispers) dispersion e(dispers_s) scaled dispersion e(dispers_p) Pearson dispersion e(dispers_ps) scaled Pearson dispersion e(nbml) 1 if negative binomial parameter estimated via ML, 0 otherwise e(vf) factor set by vfactor(), 1 if not set e(power) power set by link(power #) or link(opower #) e(rank) rank of e(V) e(rc) return code

Macros e(cmd) glm e(cmdline) command as typed e(depvar) name of dependent variable e(varfunc) program to calculate variance function e(varfunct) variance title e(varfuncf) variance function e(link) program to calculate link function e(linkt) link title e(linkf) link function e(m) number of binomial trials e(wtype) weight type e(wexp) weight expression e(clustvar) name of cluster variable e(offset) linear offset variable e(cons) noconstant, if specified e(hac_kernel) HAC kernel e(hac_lag) HAC lag e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(opt) ml or irls e(opt1) optimization title, line 1 e(opt2) optimization title, line 2 e(properties) b V e(predict) program used to implement predict e(marginsok) predictions allowed by margins e(marginsnotok) predictions disallowed by margins e(asbalanced) factor variables fvset as asbalanced e(asobserved) factor variables fvset as asobserved

Matrices e(b) coefficient vector e(V) variance-covariance matrix of the estimators e(V_modelbased) model-based variance

Functions e(sample) marks estimation sample

Reference

McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. 2nd ed. London: Chapman & Hall/CRC.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index