Generalized Linear Models (STB-18: sg22) ------------------------- ^glmr^ depvar [varlist] [^if^ exp] [^in^ range] [weight] [^, nocons^tant ^f^amily^(b^inomial [varname|#] | ^gam^ma | ^gau^ssian | ^ig^aussian | ^nb^inomial | ^p^oisson^)^ ^l^ink^(pow^er # | ^i^dentity | ^log^ | ^l^ogit | ^p^robit | ^c^loglog | ^opo^wer # | ^nb^inomial^)^ ^s^cale^(x2^|^dev^|#^)^ [^ln^]^o^ffset^(^varname^) k(^#^) disp(^#^)^ ^frv^ars ^ef^orm ^le^vel^(^#^)^ ^it^erate^(^#^)^ ^lt^ol^(^#^)^ ^ini^t^(^mu^)^ ^nolo^g ^fweights^ and ^aweights^ are allowed. Description ----------- ^glmr^ is a revision of Hilbe's (1993a,b) ^glm^ command; see Royston (1994) for details of the differences between ^glm^ and ^glmr^. Description, continued ---------------------- ^glmr^ fits generalized linear models to depvar, with covariate(s) in varlist: g(E(depvar)) = Xb, depvar distributed according to user-specified family. The estimates produced are determined by the link function, g(), and the dis- tribution family (E() is the expectation operator). For instance, if depvar is assumed to have a Gaussian (normal) distribution and g() assumed to be the identity function, we have E(depvar) = Xb, depvar distributed Gaussian normal or linear regression. If g() is the natural log function and depvar assumed to be distributed Poisson, we have log(E(depvar) = Xb, depvar distributed Poisson or Poisson regression. Other combinations are possible. Description, continued ---------------------- While ^glmr^ can be used to estimate linear regression (and, in fact, does so by default), this should be viewed solely as an instructional feature; ^regress^ produces such estimates more quickly and (possibly) more accurately. In any case, you specify the link function using the ^link()^ option and the distributional family using ^family()^. The allowed link functions are: Link function ^glmr^ option comment -------------------------------------------------------------------- identity ^link(identity)^ equiv. to ^link(power 1)^ log ^link(log)^ equiv. to ^link(power 0)^ logit ^link(logit)^ equiv. to ^link(opower 0)^ probit ^link(probit)^ complementary log-log ^link(cloglog)^ odds-power # ^link(opower^ #^)^ e.g., ^link(opower 1)^ power # ^link(power^ #^)^ e.g., ^link(power -1)^ or ^link(power 2)^ negative binomial ^link(nbinomial)^ Description, continued ---------------------- The allowed distributional families are: Family ^glmr^ option comment --------------------------------------------------------------------- Gaussian normal ^family(gaussian)^ synonym: ^family(normal)^ Inverse Gaussian ^family(igaussian)^ Bernoulli/binomial ^family(binomial)^ see note Poisson ^family(poisson)^ Negative binomial ^family(nbinomial)^ Gamma ^family(gamma)^ Note: The binomial distribution can be specified as (1) ^family(binomial)^, (2) ^family(binomial^ #^)^, or (3) ^family(binomial^ varname^)^. In case 2, # is the constant value of the binomial denominator. Specifying ^family(binomial 1)^ is the same as specifying ^family(binomial)^; it indicates that depvar has the so-called Bernoulli distribution with values 0 and 1 only. In case (3), varname is the variable containing the binomial denominator, thus allowing the denominator to vary. Description, continued ---------------------- Not all combinations of ^family()^ and ^link()^ make sense; when specifying these two options, you may choose among the following combinations: Allowed link functions iden- pro- clog- pow- nbin- Family tity log logit bit log er opow. omial Default ----------------------------------------------------------------------------- Gaussian x x x ^identity^ binomial x x x x x x x ^logit^ Poisson x x x ^log^ gamma x x x ^power -1^ inverse Gaussian x x x ^power -2^ negative binomial x x x x ^nbinomial^ Notes: Default indicates the assumed link if ^link()^ is not specified, which in all cases is the so-called canonical link. The default link for the negative binomial, ^link(nbinomial)^, is ln(E(depvar)/(E(depvar)+k)). Description, continued ---------------------- Some ^family()^ and ^link()^ combinations result in models already estimated by Stata. These are: ^family()^ ^link()^ Other Stata estimation command ---------------------------------------------------------------- ^gaussian^ ^identity^ ^regress^ or ^fit^ ^binomial^ ^logit^ ^logit^ or ^logistic^ ^binomial^ ^probit^ ^probit^ ^poisson^ ^log^ ^poisson^ ^nbinomial^ ^log^ ^nbreg^ (see note 1) ^gamma^ ^log^ ^ereg^ (see note 2) Note 1: While ^glmr^ can be used to estimate negative binomial regression, use of Stata's maximum-likelihood ^nbreg^ command is probably preferred; see the ^k()^ option below. Note 2: This requires specifying ^scale(1)^ (see options below). ^glmr^ cannot be used to estimate exponential regressions on censored data. Options ------- ^family(^...^)^ specifies the distribution of depvar; ^family(gaussian)^ is the de- fault. See description above. ^link(^...^)^ specifies the link function; the default is dependent on the ^family()^ specified. See description above. ^noconstant^ specifies that the linear predictor has no intercept term, thus forcing it through the origin on the scale defined by the link function. (continued on next screen) Options, continued ------------------ ^scale(x2^|^dev^|#^)^ overrides the default scale parameter. By default, ^scale(1)^ is assumed for the discrete distributions (binomial, Poisson, negative binomial) and ^scale(x2)^ is assumed for the continuous distributions (Gaussian, gamma, inverse Gaussian). ^scale(x2)^ means the scale parameter is set equal to the Pearson chi-square (or generalized chi-square) statistic divided by the residual degrees of freedom, as recommended by McCullagh and Nelder (1989) as a good general choice for continuous distributions ^scale(dev)^ estimates the scale parameter as the deviance divided by the residual degrees of freedom; this provides an alternative to ^scale(x2)^ for continuous distributions and over- or under-dispersed discrete distri- butions. ^scale(^#^)^ sets the scale parameter to #. For example, using ^scale(1)^ in ^family(gamma)^ models results in exponential-errors regression. Additional use of ^link(log)^ rather than the ^family(gamma)^ default ^power(-1)^ essen- tially reproduces Stata's ^ereg^ command if all the observations are uncensored. Options, continued ------------------ [^ln^]^offset(^varname^)^ specifies that varname contains values for an offset that is to be added to the linear predictor. ^offset()^ specifies the values directly. ^lnoffset()^ specifies the exponentiated values (logs of varname are added to the linear predictor). ^lnoffset()^ is most useful with Poisson-like data where varname is records the person-years of exposure to some hazard) and depvar records the observed number of "deaths." ^k(^#^)^ may be specified only with ^family(nbinomial)^ models and, in such models, ^k(1)^ is the default. The value of ^k()^ enters the variance and deviance functions; typical values for ^k()^ in real data lie between .01 and 2. Negative binomial models are often used for data with an overdispersed Poisson distribution. To use ^glmr^ to estimate such models, one searches for a ^k()^ that results in the deviance-based dispersion being approximately 1. In the case of ^link(log)^, use of the ^nbreg^ command is preferable since ^nbreg^ will estimate the entire model including ^k()^ by maximum likelihood and report appropriate confidence intervals; see the comments by Rogers (1993). For links other than log (including the canonical link), you will need to use ^glmr^. Options, continued ------------------ ^disp(^#^)^ multiplies the variance of depvar by # and divides the deviance by #. The resulting distributions are members of the so-called quasi-likelihood family; see McCullagh and Nelder (1989) for a detailed description. The option may be appropriate for use with moderately overdispersed binomial or Poisson data to adjust the standard errors of the regression coef- ficients, which otherwise are too small. ^frvars^ adds three new variables to the data set: ^_mu^ the expected value of depvar according to the fitted model; ^_eta^ the estimated linear predictor; ^_dres^ the scaled deviance residuals. Note: this option is provided as a stop-gap measure until a more sophis- ticated revised ^gpredict^ command for use after ^glmr^ is implemented. (frvars continued on next screen) Options, continued ------------------ ^frvars^, continued. Deviance residuals ^_dres^ (which are scaled by dividing by the square root of the scale parameter (see ^scale()^ above) are recommended by McCullagh and Nelder (1989) and by others as having the best properties for examining goodness of fit of a GLM. For example, they are approximately normally distributed if the model is correct. They may be plotted against the fitted values (^_mu^ or ^_eta^) or against a covariate to inspect the model's fit. Several other types of residuals -- not yet implemented -- are believed to be better for certain specific purposes and these will be included in the forthcoming revised ^gpredict^ command. ^eform^ displays the exponentiated coefficients and corresponding standard errors and confidence intervals as described in [7] maximize. For ^family(binomial) link(logit)^ (i.e., logistic regression), exponentiation results in odds ratios; for ^family(poisson) link(log)^ (i.e., Poisson regression), exponentiated coefficients are rate ratios. ^level(^#^)^ specifies the percent coverage for confidence intervals of the coefficients; see [4] estimate. Options, continued ------------------ ^iterate(^#^)^ specifies the maximum number of iterations allowed in estimating the mode; ^iterate(50)^ is the default. You should rarely need to specify ^iterate()^. ^ltol(^#^)^ specifies the convergence criterion for the change in deviance between iterations; ^ltol(1e-8)^ is the default. While lower values would theore- tically yield more precise numerical results, in practice 1e-8 is believed to be small enough to approach machine precision. ^init(^mu^)^ allows you to use the variable mu as the initial estimate for the mean of yvar. This could be useful with models that produce convergence difficulties, for example, ^family(binomial)^ models with power or odds- power (^opower^) links. (Examples on next screen) Examples -- continuous dependent variable ----------------------------------------- Using ^auto.dta^ to illustrate models for a continuous dependent variable: Examples -- continuous dependent variable -- standard linear regression ----------------------------------------------------------------------- . ^glmr mpg weight^ This, in fact, is not recommended because ^regress^ and ^fit^ are faster. Examples -- continuous dependent variable -- log link ----------------------------------------------------- . ^glmr mpg weight, link(log)^ In linear regression of mpg on weight with a log link, the model is ln(E(mpg)) = b0 + b1*weight or, if you prefer: E(mpg) = exp(b0 + b1*weight) This is different from a linear regression of ln(mpg) on weight because, in the GLM case, mpg is assumed to be distributed normally; in the regression case, ln(mpg) is assumed to be distributed normally. Examples -- continuous dependent variable -- gamma-distributed errors --------------------------------------------------------------------- Estimating a model with the "canonical" reciprocal link: . ^glmr mpg weight, fam(gam)^ Equivalently, one could specify the link explicitly: . ^glmr mpg weight, fam(gam) link(power -1)^ Examples -- discrete dependent variable --------------------------------------- Using ^kyphsp.dta^ to illustrate models for a discrete dependent variable: Examples -- discrete dependent variable -- logistic regression -------------------------------------------------------------- Logistic regression, showing odds ratios; Bernoulli (binomial 0/1) response: . ^glmr kyph age start, fam(bin) eform^ Results will be equivalent to those reported by ^logit^ or ^logistic^. (logistic regression examples continued on next screen) Examples -- discrete dependent variable -- logistic regression (continued) -------------------------------------------------------------------------- We have data from an experiment. In each replication, 100 fruit flies were sprayed with a particular dosage of an insecticide and the number dead after one hour were counted: . ^glmr dead dose, family(binomial 100)^ Results will be equivalent to: . ^gen pop = 100^ . ^blogit dead pop dose^ Alternatively, assume in each replication the number of flies we begin with varies and is recorded in variable nflies: . ^glmr dead dose, family(binomial nflies)^ Results will be equivalent to . ^blogit dead nflies dose^ Examples -- discrete dependent variable -- probit ------------------------------------------------- Repeating the above examples using probit rather than logit, we simply add the ^link(probit)^ option: . ^glmr kyph age start, fam(bin) link(probit)^ . ^glmr dead dose, family(binomial 100) link(probit)^ . ^glmr dead dose, family(binomial nflies) link(probit)^ Results will be equivalent to those reported by ^probit^ (first command) or ^bprobit^ (second and third commands). Examples -- other ----------------- See Hilbe (1993a). Saved results ------------- ^glmr^ saves in the ^$S_^# macros: ^S_1^ number of observations ^S_2^ residual degrees of freedom ^S_3^ deviance ^S_4^ Pearson X2 References ---------- Hilbe, J. 1993a. sg16: Generalized linear models. ^Stata Technical Bulletin^ 11: 20-28. ------. 1993b. sg16.3: Quasi-likelihood modeling using an enhanced ^glm^ command. ^Stata Technical Bulletin^ 16: 6. McCullagh, P. and J. A. Nelder. 1989. ^Generalized Linear Models^. 2nd ed. New York: Chapman & Hall. References, continued --------------------- Rogers, W. H. 1993. sg16.4: Comparison of ^nbreg^ and ^glm^ for negative binomial. ^Stata Technical Bulletin^ 16: 7. Royston, P. 1994. sg22: Generalized linear models: revision of ^glm^. ^Stata Technical Bulletin^ 18. Also see -------- STB: sg22 (STB-18) On-line: ^help^ for ^fit^, ^glogit^, ^logit^, ^logistic^, ^nbreg^, ^regress^, ^weibull^