help glm dialogs: glm svy: glm
also see: glm postestimation
-------------------------------------------------------------------------------
Title
[R] glm -- Generalized linear models
Syntax
glm depvar [indepvars] [if] [in] [weight] [, options]
options description
-------------------------------------------------------------------------
Model
family(familyname) distribution of depvar; default is
family(gaussian)
link(linkname) link function; default is canonical link for
family() specified
Model 2
noconstant suppress constant term
exposure(varname) include ln(varname) in model with coefficient
constrained to 1
offset(varname) include varname in model with coefficient
constrained to 1
constraints(constraints) apply specified linear constraints
collinear keep collinear variables
mu(varname) use varname as the initial estimate for the
mean of depvar
init(varname) synonym for mu(varname)
SE/Robust
vce(vcetype) vcetype may be oim, robust, cluster clustvar,
eim, opg, bootstrap, jackknife, hac kernel,
jackknife1, or unbiased
vfactor(#) multiply variance matrix by scalar #
disp(#) quasilikelihood multiplier
scale(x2|dev|#) set the scale parameter
Reporting
level(#) set confidence level; default is level(95)
eform report exponentiated coefficients
nocnsreport do not display constraints
display_options control spacing and display of omitted
variables and base and empty cells
Maximization
ml use maximum likelihood optimization; the
default
irls use iterated, reweighted least-squares
optimization of the deviance
maximize_options control the maximization process; seldom used
fisher(#) use the Fisher scoring Hessian or expected
information matrix (EIM)
search search for good starting values
+ noheader suppress header table from above coefficient
table
+ notable suppress coefficient table
+ nodisplay suppress the output; iteration log is still
displayed
+ coeflegend display coefficients' legend instead of
coefficient table
-------------------------------------------------------------------------
familyname description
-------------------------------------------------------------------------
gaussian Gaussian (normal)
igaussian inverse Gaussian
binomial[varnameN|#N] Bernoulli/binomial
poisson Poisson
nbinomial[#k|ml] negative binomial
gamma gamma
-------------------------------------------------------------------------
linkname description
-------------------------------------------------------------------------
identity identity
log log
logit logit
probit probit
cloglog cloglog
power # power
opower # odds power
nbinomial negative binomial
loglog log-log
logc log-complement
-------------------------------------------------------------------------
+ noheader, notable, nodisplay, and coeflegend do not appear in the
dialog box.
indepvars may contain factor variables; see fvvarlist.
depvar and indepvars may contain time-series operators; see tsvarlist.
bootstrap, by, fracpoly, jackknife, mfp, mi estimate, nestreg, rolling,
statsby, stepwise, and svy are allowed; see prefix.
vce(bootstrap), vce(jackknife), and vce(jackknife1) are not allowed with
the mi estimate prefix.
Weights are not allowed with the bootstrap prefix.
aweights are not allowed with the jackknife prefix.
vce(), vfactor(), disp(), scale(), irls, fisher(), noheader, notable,
nodisplay, and weights are not allowed with the svy prefix.
fweights, aweights, iweights, and pweights are allowed; see weight.
See [R] glm postestimation for features available after estimation.
Menu
Statistics > Generalized linear models > Generalized linear models (GLM)
Description
glm fits generalized linear models. It can fit models by using either
IRLS (maximum quasilikelihood) or Newton-Raphson (maximum likelihood)
optimization, which is the default. Previous versions of glm used only
IRLS.
See logistic estimation commands for lists of related estimation
commands.
Options
+-------+
----+ Model +------------------------------------------------------------
family(familyname) specifies the distribution of depvar; family(gaussian)
is the default.
link(linkname) specifies the link function; the default is the canonical
link for the family() specified.
+---------+
----+ Model 2 +----------------------------------------------------------
noconstant, exposure(varname), offset(varname), constraints(constraints),
collinear; see [R] estimation options. constraints(constraints) and
collinear are not allowed with irls.
mu(varname) specifies varname as the initial estimate for the mean of
depvar. This option can be useful with models that experience
convergence difficulties, such as family(binomial) models with power
or odds-power links. init(varname) is a synonym.
+-----------+
----+ SE/Robust +--------------------------------------------------------
vce(vcetype) specifies the type of standard error reported, which
includes types that are derived from asymptotic theory, that are
robust to some kinds of misspecification, that allow for intragroup
correlation, and that use bootstrap or jackknife methods; see [R]
vce_option.
In addition to the standard vcetypes, glm allows the following
alternatives:
vce(eim) specifies that the EIM estimate of variance be used.
vce(jackknife1) specifies that the one-step jackknife estimate of
variance be used.
vce(hac kernel [#]) specifies that a heteroskedasticity- and
autocorrelation-consistent (HAC) variance estimate be used. HAC
refers to the general form for combining weighted matrices to
form the variance estimate. There are three kernels built into
glm. kernel is a user-written program or one of
nwest | gallant | anderson
# specifies the number of lags. If # is not specified, N - 2 is
assumed. If you wish to specify vce(hac ... ), you must tsset
your data before calling glm.
vce(unbiased) specifies that the unbiased sandwich estimate of
variance be used.
vfactor(#) specifies a scalar by which to multiply the resulting variance
matrix. This option allows you to match output with other packages,
which may apply degrees of freedom or other small-sample corrections
to estimates of variance.
disp(#) multiplies the variance of depvar by # and divides the deviance
by #. The resulting distributions are members of the quasilikelihood
family.
scale(x2|dev|#) overrides the default scale parameter. This option is
allowed only with Hessian (information matrix) variance estimates.
By default, scale(1) is assumed for the discrete distributions
(binomial, Poisson, and negative binomial), and scale(x2) is assumed
for the continuous distributions (Gaussian, gamma, and inverse
Gaussian).
scale(x2) specifies that the scale parameter be set to the Pearson
chi-squared (or generalized chi-squared) statistic divided by the
residual degrees of freedom, which is recommended by McCullagh and
Nelder (1989) as a good general choice for continuous distributions.
scale(dev) sets the scale parameter to the deviance divided by the
residual degrees of freedom. This option provides an alternative to
scale(x2) for continuous distributions and overdispersed or
underdispersed discrete distributions.
scale(#) sets the scale parameter to #. For example, using scale(1)
in family(gamma) models results in exponential-errors regression.
Additional use of link(log) rather than the default link(power -1)
for family(gamma) essentially reproduces Stata's streg, dist(exp)
nohr command (see [ST] streg) if all the observations are uncensored.
+-----------+
----+ Reporting +--------------------------------------------------------
level(#); see [R] estimation options.
eform displays the exponentiated coefficients and corresponding standard
errors and confidence intervals. For family(binomial) link(logit)
(i.e., logistic regression), exponentiation results in odds ratios;
for family(poisson) link(log) (i.e., Poisson regression),
exponentiated coefficients are rate ratios.
nocnsreport; see [R] estimation options.
display_options: noomitted, vsquish, noemptycells, baselevels,
allbaselevels; see [R] estimation options.
+--------------+
----+ Maximization +-----------------------------------------------------
ml requests that optimization be carried out using Stata's ml commands
and is the default.
irls requests iterated, reweighted least-squares (IRLS) optimization of
the deviance instead of Newton-Raphson optimization of the log
likelihood. If the irls option is not specified, the optimization is
carried out using Stata's ml commands, in which case all options of
ml maximize are also available.
maximize_options: difficult, technique(algorithm_spec), iterate(#),
[no]log, trace, gradient, showstep, hessian, showtolerance,
tolerance(#), ltolerance(#), nrtolerance(#), nonrtolerance,
from(init_specs); see [R] maximize. These options are seldom used.
Setting the optimization type to technique(bhhh) resets the default
vcetype to vce(opg).
fisher(#) specifies the number of Newton-Raphson steps that should use
the Fisher scoring Hessian or EIM before switching to the observed
information matrix (OIM). This option is useful only for
Newton-Raphson optimization (and not when using irls).
search specifies that the command search for good starting values. This
option is useful only for Newton-Raphson optimization (and not when
using irls).
The following options are available with glm but are not shown in the
dialog box:
noheader suppresses the header information from the output. The
coefficient table is still displayed.
notable suppresses the table of coefficients from the output. The header
information is still displayed.
nodisplay suppresses the output. The iteration log is still displayed.
coeflegend; see [R] estimation options.
Remarks
Although glm can be used to fit linear regression (and, in fact, does so
by default), this should be viewed as an instructional feature; regress
produces such estimates more quickly, and many postestimation commands
are available to explore the adequacy of the fit; see [R] regress and [R]
regress postestimation.
In any case, you should specify the link function by using the link()
option and specify the distributional family by using family(). The
available link functions are
Link function glm option
----------------------------------------
identity link(identity)
log link(log)
logit link(logit)
probit link(probit)
complementary log-log link(cloglog)
odds power link(opower #)
power link(power #)
negative binomial link(nbinomial)
log-log link(loglog)
log-complement link(logc)
The available distributional families are
Family glm option
----------------------------------------
Gaussian(normal) family(gaussian)
inverse Gaussian family(igaussian)
Bernoulli/binomial family(binomial)
Poisson family(poisson)
negative binomial family(nbinomial)
gamma family(gamma)
You do not have to specify both family() and link(); the default link()
is the canonical link for the specified family() (except for nbinomial):
Family Default link
--------------------------------------
family(gaussian) link(identity)
family(igaussian) link(power -2)
family(binomial) link(logit)
family(poisson) link(log)
family(nbinomial) link(log)
family(gamma) link(power -1)
If you specify both family() and link(), not all combinations make sense.
You may choose from the following combinations:
| id log logit probit clog pow opower nbinomial loglog logc
----------+-------------------------------------------------------------------
Gaussian | x x x
inv. Gau. | x x x
binomial | x x x x x x x x x
Poisson | x x x
neg. bin. | x x x x
gamma | x x x
Examples
---------------------------------------------------------------------------
Setup
. webuse lbw
Generalized linear model with Bernoulli family and default logit link
. glm low age lwt i.race smoke ptl ht ui, family(binomial)
Replay results and report exponentiated coefficients
. glm, eform
---------------------------------------------------------------------------
Setup
. webuse ldose
Generalized linear model with binomial family and default logit link
. glm r ldose, family(binomial n)
Generalized linear model with binomial family and cloglog link
. glm r ldose, family(binomial n) link(cloglog)
---------------------------------------------------------------------------
Setup
. webuse beetle
Generalized linear model with binomial family and cloglog link
. glm r i.beetle ldose, family(binomial n) link(cloglog)
Replay results with 99% confidence intervals
. glm, level(99)
---------------------------------------------------------------------------
Saved results
glm, ml saves the following in e():
Scalars
e(N) number of observations
e(k) number of parameters
e(k_eq) number of equations in e(b)
e(k_eq_model) number of equations in model Wald test
e(k_dv) number of dependent variables
e(k_autoCns) number of base, empty, and omitted constraints
e(df_m) model degrees of freedom
e(df) residual degrees of freedom
e(phi) scale parameter
e(aic) model AIC
e(bic) model BIC
e(ll) log likelihood, if NR
e(N_clust) number of clusters
e(chi2) chi-squared
e(p) significance
e(deviance) deviance
e(deviance_s) scaled deviance
e(deviance_p) Pearson deviance
e(deviance_ps) scaled Pearson deviance
e(dispers) dispersion
e(dispers_s) scaled dispersion
e(dispers_p) Pearson dispersion
e(dispers_ps) scaled Pearson dispersion
e(nbml) 1 if negative binomial parameter estimated via ML,
0 otherwise
e(vf) factor set by vfactor(), 1 if not set
e(power) power set by power(), opower()
e(rank) rank of e(V)
e(ic) number of iterations
e(rc) return code
e(converged) 1 if converged, 0 otherwise
Macros
e(cmd) glm
e(cmdline) command as typed
e(depvar) name of dependent variable
e(varfunc) name of variance function used
e(varfunct) Gaussian, Inverse Gaussian, Binomial, Poisson, Neg.
Binomial, Bernoulli, Power, or Gamma
e(varfuncf) variance function
e(link) name of link function used
e(linkt) link title
e(linkf) link form
e(m) number of binomial trials
e(wtype) weight type
e(wexp) weight expression
e(title) title in estimation output
e(clustvar) name of cluster variable
e(offset) offset
e(chi2type) Wald or LR; type of model chi-squared test
e(cons) set if noconstant specified
e(hac_kernel) HAC kernel
e(hac_lag) HAC lag
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. Err.
e(opt) ml or irls
e(opt1) optimization title, line 1
e(opt2) optimization title, line 2
e(which) max or min; whether optimizer is to perform
maximization or minimization
e(ml_method) type of ml method
e(user) name of likelihood-evaluator program
e(technique) maximization technique
e(singularHmethod) m-marquardt or hybrid; method used when Hessian is
singular
e(crittype) optimization criterion
e(properties) b V
e(predict) program used to implement predict
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(Cns) constraints matrix
e(ilog) iteration log (up to 20 iterations)
e(gradient) gradient vector
e(V) variance-covariance matrix of the estimators
e(V_modelbased) model-based variance
Functions
e(sample) marks estimation sample
glm, irls saves the following in e():
Scalars
e(N) number of observations
e(k) number of parameters
e(k_eq_model) number of equations in model Wald test
e(df_m) model degrees of freedom
e(df) residual degrees of freedom
e(phi) scale parameter
e(disp) dispersion parameter
e(bic) model BIC
e(N_clust) number of clusters
e(deviance) deviance
e(deviance_s) scaled deviance
e(deviance_p) Pearson deviance
e(deviance_ps) scaled Pearson deviance
e(dispers) dispersion
e(dispers_s) scaled dispersion
e(dispers_p) Pearson dispersion
e(dispers_ps) scaled Pearson dispersion
e(nbml) 1 if negative binomial parameter estimated via ML,
0 otherwise
e(vf) factor set by vfactor(), 1 if not set
e(power) power set by power(), opower()
e(rank) rank of e(V)
e(rc) return code
Macros
e(cmd) glm
e(cmdline) command as typed
e(depvar) name of dependent variable
e(varfunc) name of variance function used
e(varfunct) Gaussian, Inverse Gaussian, Binomial, Poisson, Neg.
Binomial, Bernoulli, Power, or Gamma
e(varfuncf) variance function
e(link) name of link function used
e(linkt) link title
e(linkf) link form
e(m) number of binomial trials
e(wtype) weight type
e(wexp) weight expression
e(clustvar) name of cluster variable
e(offset) offset
e(cons) set if noconstant specified
e(hac_kernel) HAC kernel
e(hac_lag) HAC lag
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. Err.
e(opt) ml or irls
e(opt1) optimization title, line 1
e(opt2) optimization title, line 2
e(crittype) optimization criterion
e(properties) b V
e(predict) program used to implement predict
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(V) variance-covariance matrix of the estimators
e(V_modelbased) model-based variance
Functions
e(sample) marks estimation sample
Reference
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. 2nd
ed. London: Chapman & Hall/CRC.
Also see
Manual: [R] glm
Help: [R] glm postestimation;
[R] cloglog, [R] logistic, [R] nbreg, [R] poisson, [R] regress,
[SVY] svy estimation, [XT] xtgee