.- help for ^glm^ (manual: ^[R] glm^) .- Generalized linear models ------------------------- ^glm^ depvar [varlist] [weight] [^if^ exp] [^in^ range] [^,^ ^f^amily^(^familyname^)^ ^l^ink^(^linkname^)^ ^nocons^tant ^s^cale^(x2^|^dev^|#^)^ [^ln^]^o^ffset^(^varname^)^ ^disp(^#^)^ ^ef^orm ^le^vel^(^#^)^ ^it^erate^(^#^)^ ^lt^ol^(^#^)^ ^ini^t^(^varname^)^ ^nolo^g ] where familyname is one of ^gau^ssian | ^ig^aussian | ^b^inomial [varname|#] | ^p^oisson | ^nb^inomial [#] | ^gam^ma and linkname is one of ^i^dentity | ^log^ | ^l^ogit | ^p^robit | ^c^loglog | ^opo^wer # | ^pow^er # | ^nb^inomial | ^h^t varname ^aweight^s, ^fweight^s, and ^iweight^s are allowed; see help @weights@. ^glm^ shares the features of all estimation commands; see help @est@. ^glm^ may be used with ^sw^ to perform stepwise estimation; see help @sw@. The syntax of @predict@ after ^glm^ is ^predict^ [type] newvarname [^if^ exp] [^in^ range] [^,^ statistic ^nooff^set] where statistic is ^m^u predicted mean of y = g_inverse(xb); the default ^xb^ linear prediction ^stdp^ standard error of the linear prediction ^d^eviance deviance residual ^p^earson Pearson residual These statistics are available both in and out of sample; type "^predict^ ... ^if e(sample)^ ..." if wanted only for the estimation sample. Description ----------- ^glm^ fits generalized linear models. Options ------- ^family(^familyname^)^ specifies the distribution of depvar; ^family(gaussian)^ is the default. ^link(^linkname^)^ specifies the link function; the default is the canonical link for the ^family()^ specified. ^noconstant^ specifies that the linear predictor has no intercept term, thus forcing it through the origin on the scale defined by the link function. ^scale(x2^|^dev^|#^)^ overrides the default scale parameter. By default, ^scale(1)^ is assumed for discrete distributions (binomial, Poisson, negative binomial) and ^scale(x2)^ for continuous distributions (Gaussian, gamma, inverse Gaussian). ^scale(x2)^ specifies the scale parameter be set to the Pearson chi-squared (or generalized chi-squared) statistic divided by the residual degrees of freedom. ^scale(dev)^ sets the scale parameter to the deviance divided by the residual degrees of freedom. This provides an alternative to ^scale(x2)^ for con- tinuous distributions and over- or under-dispersed discrete distributions. ^scale(^#^)^ sets the scale parameter to #. [^ln^]^offset(^varname^)^ specifies an offset to be added to the linear predictor. ^offset()^ specifies the values directly: g(E(y)) = xB + varname. ^lnoffset()^ specifies exponentiated values: g(E(y)) = xB + ln(varname). ^disp(^#^)^ multiplies the variance of y by # and divides the deviance by #. The resulting distributions are members of the quasi-likelihood family. ^eform^ displays the exponentiated coefficients and corresponding standard errors and confidence intervals. For binomial models with the logit link, expo- nentiation results in odds ratios; for Poisson models with the log link, exponentiated coefficients are rate ratios. ^level(^#^)^ specifies the confidence level, in percent, for confidence intervals of the coefficients; see help @level@. ^iterate(^#^)^ specifies the maximum number of iterations allowed in estimating the model; ^iterate(50)^ is the default. ^ltol(^#^)^ specifies the convergence criterion for the change in deviance between iterations; ^ltol(1e-6)^ is the default. ^init(^varname^)^ specifies varname containing an initial estimate for the mean of depvar. This can be useful if you encounter convergence difficulties, especially with binomial models with power or odds-power links. ^nolog^ suppresses the iteration log. Options for @predict@ ------------------- ^mu^, the default, requests the predicted value of y; y_hat = g_inverse(xb). ^xb^ requests the linear predictor xb. ^stdp^ requests the standard error of the linear predictor. ^deviance^ requests the deviance residuals. ^pearson^ requests Pearson residuals. ^nooffset^ is relevant only if you specified ^offset()^ or ^lnoffset()^ for ^glm^. It modifies the calculations made by ^predict^ so that they ignore the offset variable; the linear prediction is treated as x_j*b rather than x_j*b + offset_j. Remarks ------- The allowed link functions are Link function ^glm^ option ------------------------------------------ identity ^link(identity)^ log ^link(log)^ logit ^link(logit)^ probit ^link(probit)^ complementary log-log ^link(cloglog)^ odds power ^link(opower^ #^)^ power ^link(power^ #^)^ negative binomial ^link(nbinomial)^ Hakulinen & Tenkanen ^link(ht varname)^ The allowed distribution families are Family ^glm^ option ---------------------------------------------------------------- Gaussian (normal) ^family(gaussian)^ or ^family(normal)^ Inverse Gaussian ^family(igaussian)^ Bernoulli/binomial ^family(binomial)^ Poisson ^family(poisson)^ Negative binomial ^family(nbinomial)^ Gamma ^family(gamma)^ The allowed combinations are | id log logit probit cloglog power opower nb ht --------------+--------------------------------------------------------------- Gaussian | x x x inv. Gaussian | x x x binomial | x x x x x x x x Poisson | x x x neg. binomial | x x x x gamma | x x x If you specify ^family()^ but not ^link()^, you obtain the canonical link for the family: ^family()^ default ^link()^ ---------------------------------------- ^family(gaussian)^ ^link(identity)^ ^family(igaussian)^ ^link(power -2)^ ^family(binomial)^ ^link(logit)^ ^family(poisson)^ ^link(log)^ ^family(nbinomial)^ ^link(log)^ ^family(gamma)^ ^link(power -1)^ Special comments on ^family(gaussian)^ models ------------------------------------------- While ^glm^ can be used to fit linear regression (^family(gaussian) link(identity)^ models) and, in fact, does so by default, it is better to use the @regress@ com- mand because it is quicker and numerous post-estimation commands are available to explore the adequacy of the fit. Special comments on ^family(binomial)^ models ------------------------------------------- The binomial distribution can be specified (1) ^family(binomial)^ (2) ^family(binomial^ #^)^ (3) ^family(binomial^ varname^)^ In case 2, # is the value of the binomial denominator N, the number of trials. Specifying ^family(binomial 1)^ is the same as specifying ^family(binomial)^; both mean that y has the Bernoulli distribution with values 0 and 1 only. In case 3, varname is a variable containing the binomial denominator, thus allowing the number of trials to vary across observations. For ^family(binomial) link(logit)^ models, we recommend using the @logistic@ com- mand in preference to ^glm^. Both produce the same answers, but @logistic@ pro- vides useful post-estimation commands. For ^family(binomial) link(probit)^ models, we recommend using the @probit@ command in preference to ^glm^. Both produce the same coefficients, but the standard errors are only asymptotically equivalent because probit is not the canonical link for the binomial. The @probit@ command produces full maximum-likelihood results. Special comments on ^family(binomial) link(ht varname)^ models -------------------------------------------------------------- This is the Hakulinen an Tenkanen link function where ^varname^ is a variable containing expected probabilities. The link function is then log(-log(p1/p2)) where p1 are the observed probabilities and p2 are the expected probabilities represented in ^varname^. Special comments on ^family(nbinomial)^ models -------------------------------------------- The negative binomial distribution can be specified as (1) ^family(nbinomial)^ (2) ^family(nbinomial^ #^)^ ^family(nbinomial)^ is equivalent to ^family(nbinomial 1)^. #, often called k, enters the variance and deviance functions; typical values range between .01 and 2. ^family(nbinomial) link(log)^ models -- also known as negative binomial regres- sion -- are used for data with an overdispersed Poisson distribution. While ^glm^ can be used to estimate such models, use of Stata's maximum-likelihood @nbreg@ command is probably preferable. Under the ^glm^ approach, one must search for value of k that results in the deviance-based dispersion being 1. @nbreg@, on the other hand, finds the maximum-likelihood estimate of k and reports a confidence interval for it. Special comment on ^family(gamma) link(log)^ models ------------------------------------------------- ^glm^ can be used to estimate exponential regression, but this requires specify- ing ^scale(1)^. It is better to use the @ereg@ command. ^glm^-reported standard errors will be only asymptotically equivalent to those reported by @ereg@ because log is not the canonical link for the gamma family. In addition, ^glm^ cannot be used to estimate exponential regressions on censored data. Examples -------- . ^glm low age lwt race2 race3 smoke ptl ht ui, f(bin) l(logit)^ . ^glm, eform^ . ^glm dead ln_dose, family(binomial pop) link(logit)^ . ^glm dead ln_dose, family(binomial pop) link(cloglog)^ . ^predict e_deaths^ . ^summarize dead e_deaths^ . ^predict rd if e(sample), deviance^ . ^xi: glm dead i.beetle ln_dose, f(bin pop) link(cl)^ . ^xi: glm dead i.beetle*ln_dose, f(bin pop) link(cl)^ . ^testparm I*^ Also see -------- Manual: ^[U] 23 Estimation and post-estimation commands^, ^[U] 29 Overview of model estimation in Stata^, ^[R] glm^ On-line: help for @est@, @postest@; @cloglog@, @logistic@, @nbreg@, @poisson@, @regdiag@, @regress@, @streg@, @sw@, @weibull@, @xtgee@