Generalized Linear Models and Extensions, Second Edition
Authors: |
James W. Hardin and Joseph M. Hilbe |
| Publisher: |
Stata Press |
| Copyright: |
2007 |
| ISBN-13: |
978-1-881228-60-8 |
| Pages: |
387; paperback |
|
|
|
|
Comment from the Stata technical group
Generalized linear models (GLMs) extend standard linear (Gaussian)
regression techniques to models with a non-Gaussian, or even discrete,
response. GLM theory is predicated on the exponential family of
distributions—a class so rich that it includes the commonly used
logit, probit, and Poisson distributions. Although one can fit these models
in Stata by using specialized commands (e.g., logit for logit
models), fitting them under the GLM paradigm with Stata’s glm command
offers the advantage of having many models under the same roof. For example,
model diagnostics may be calculated and interpreted similarly regardless of
the assumed distribution.
This text thoroughly covers GLMs, both theoretically and computationally.
The theory consists of showing how the various GLMs are special cases of the
exponential family, general properties of this family of distributions, and
the derivation of maximum likelihood (ML) estimators and standard errors. The
book shows how iteratively reweighted least squares, another method of
parameter estimation, are a consequence of ML estimation via Fisher scoring.
The authors also discuss different methods of estimating standard errors,
including robust methods, robust methods with clustering, Newey–West,
outer product of the gradient, bootstrap, and jackknife. The thorough
coverage of model diagnostics includes measures of influence such as Cook’s
distance, nine forms of residuals, the Akaike and Bayesian information
criteria, and various R2-type measures of explained variability.
After presenting general theory, the text then breaks down each
distribution. Each distribution has its own chapter that discusses the
computational details of applying the general theory to that particular
distribution. Pseudocode plays a valuable role here, because it lets the
authors describe computational algorithms relatively simply. Devoting an
entire chapter to each distribution (or family in GLM terms) also
allows for the inclusion of real-data examples showing how Stata fits such models, as
well as presenting certain diagnostics and analytical strategies that are
unique to that family. The chapters on binary data and on count (Poisson)
data are excellent in this regard. Hardin and Hilbe give ample attention to
the problems of overdispersion and zero inflation in count-data models.
The final part of the text concerns extensions of GLMs, which come in three
forms. First, some chapters cover multinomial responses, both ordered and
unordered. Although strictly not part of GLM, the theory is similar in that
one can think of a multinominal response as an extension of a binary
response. The examples presented in these chapters often use the authors’
own Stata programs, augmenting official Stata’s capabilities. Second, GLMs
may be extended to clustered data through generalized estimating equations
(GEEs), and one chapter covers GEE theory and examples. Finally, GLMs may be
extended by programming one’s own family and link functions for use with
Stata’s official glm command, and the book covers this process.
Table of contents
List of Tables
List of Figures
List of Listings
Preface
1 Introduction
1.1 Origins and motivation
1.2 Notational conventions
1.3 Applied or theoretical?
1.4 Road map
1.5 Installing the support materials
I Foundations of Generalized Linear Models
2 GLMs
2.1 Components
2.2 Assumptions
2.3 Exponential family
2.4 Example: Using an offset in a GLM
2.5 Summary
3 GLM estimation algorithms
3.1 Newton–Raphson (using the observed Hessian)
3.2 Starting values for the Newton–Raphson
3.3 IRLS (using the expected Hessian)
3.4 Starting values for IRLS
3.5 Goodness of fit
3.6 Estimated variance matrices
3.6.1 Hessian
3.6.2 Outer products of the gradient
3.6.3 Sandwich
3.6.4 Modified sandwich
3.6.5 Unbiased sandwich
3.6.6 Modified unbiased sandwich
3.6.7 Weighted sandwich: Newey–West
3.6.8 Jackknife
3.6.8.1 Usual jackknife
3.6.8.2 One-step jackknife
3.6.8.3 Weighted jackknife
3.6.8.4 Variable jackknife
3.6.9 Bootstrap
3.6.9.1 Usual bootstrap
3.6.9.2 Grouped bootstrap
3.7 Estimation algorithms
3.8 Summary
4 Analysis of fit
4.1 Deviance
4.2 Diagnostics
4.2.1 Cook’s distance
4.2.2 Overdispersion
4.3 Assessing the link function
4.4 Checks for systematic departure from the model
4.5 Residual analysis
4.5.1 Response residuals
4.5.2 Working residuals
4.5.3 Pearson residuals
4.5.4 Partial residuals
4.5.5 Anscombe residuals
4.5.6 Deviance residuals
4.5.7 Adjusted deviance residuals
4.5.8 Likelihood residuals
4.5.9 Score residuals
4.6 Model statistics
4.6.1 Criterion measures
4.6.1.1 AIC
4.6.1.2 BIC
4.6.2 The interpretation of R 2 in linear regression
4.6.2.1 Percent variance explained
4.6.2.2 The ratio of variances
4.6.2.3 A transformation of the likelihood ratio
4.6.2.4 A transformation of the F test
4.6.2.5 Squared correlation
4.6.3 Generalizations of linear regression R 2 interpretations
4.6.3.1 Efron’s pseudo-R2
4.6.3.2 McFadden’s likelihood-ratio index
4.6.3.3 Ben-Akiva and Lerman adjusted likelihood-ratio index
4.6.3.4 McKelvey and Zavoina ratio of variances
4.6.3.5 Cragg and Uhler normed measure
4.6.4 More R 2 measures
4.6.4.1 The count R2
4.6.4.2 The adjusted count R2
4.6.4.3 Veall and Zimmermann R2
4.6.4.4 Cameron–Windmeijer R2
II Continuous-Response Model
5 The Gaussian family
5.1 Derivation of the GLM Gaussian family
5.2 Derivation in terms of the mean
5.3 IRLS GLM algorithm (nonbinomial)
5.4 Maximum likelihood estimation
5.5 GLM log-normal models
5.6 Expected versus observed information matrix
5.7 Other Gaussian links
5.8 Example: Relation to OLS
5.9 Example: Beta-carotene
6 The gamma family
6.1 Derivation of the gamma model
6.2 Example: Reciprocal link
6.3 Maximum likelihood estimation
6.4 Log-gamma models
6.5 Identity-gamma models
6.6 Using the gamma model for survival analysis
7 The inverse Gaussian family
7.1 Derivation of the inverse Gaussian model
7.2 The inverse Gaussian algorithm
7.3 Maximum likelihood algorithm
7.4 Example: The canonical inverse Gaussian
7.5 Noncanonical links
8 The power family and link
8.1 Power links
8.2 Example: Power link
8.3 The power family
III Binomial Response Models
9 The binomial–logit family
9.1 Derivation of the binomial model
9.2 Derivation of the Bernoulli model
9.3 The binomial regression algorithm
9.4 Example: Logistic regression
9.4.1 Model producing logistic coefficients: The heart data
9.4.2 Model producing logistic odds ratios
9.5 GOF statistics
9.6 Interpretation of parameter estimates
10 The general binomial family
10.1 Noncanonical binomial models
10.2 Noncanonical binomial links (binary form)
10.3 The probit model
10.4 The clog-log and log-log models
10.5 Other links
10.6 Interpretation of coefficients
10.6.1 Identity link
10.6.2 Logit link
10.6.3 Log link
10.6.4 Log complement link
10.6.5 Summary
10.7 Generalized binomial regression
11 The problem of overdispersion
11.1 Overdispersion
11.2 Scaling of standard errors
11.3 Williams’ procedure
11.4 Robust standard errors
IV Count Response Models
12 The Poisson family
12.1 Count response regression models
12.2 Derivation of the Poisson algorithm
12.3 Poisson regression: Examples
12.4 Example: Testing overdispersion in the Poisson model
12.5 Using the Poisson model for survival analysis
12.6 Using offsets to compare models
12.7 Interpretation of coefficients
13 The negative binomial family
13.1 Constant overdispersion
13.2 Variable overdispersion
13.2.1 Derivation in terms of a Poisson-gamma mixture
13.2.2 Derivation in terms of the negative binomial probability function
13.2.3 The canonical link negative binomial parameterization
13.3 The log-negative binomial parameterization
13.4 Negative binomial examples
13.5 The geometric family
13.6 Interpretation of coefficients
14 Other count data models
14.1 Count response regression models
14.2 Zero-truncated models
14.3 Zero-inflated models
14.4 Hurdle models
14.5 Heterogeneous negative binomial models
14.6 Generalized Poisson regression models
14.7 Censored count response models
V Multinomial Response Models
15 The ordered-response family
15.1 Ordered outcomes for general link
15.2 Ordered outcomes for specific links
15.2.1 Ordered logit
15.2.2 Ordered probit
15.2.3 Ordered clog-log
15.2.4 Ordered log-log
15.2.5 Ordered cauchit
15.3 Generalized ordered outcome models
15.4 Example: Synthetic data
15.5 Example: Automobile data
15.6 Partial proportional-odds models
15.7 Continuation ratio models
16 Unordered-response family
16.1 The multinomial logit family
16.1.1 Example: Relation to logistic regression
16.1.2 Example: Relation to conditional logistic regression
16.1.3 Example: Extensions with conditional logistic regression
16.1.4 The independence of irrelevant alternatives
16.1.5 Example: Assessing the IIA
16.1.6 Interpreting coefficients
16.1.7 Example: Medical admissions—introduction
16.1.8 Example: Medical admissions—summary
16.2 The multinomial probit models
16.2.1 Example: A comparison of the models
16.2.2 Example: Comparing probit and multinomial probit
16.2.3 Example: Concluding remarks
VI Extensions to the GLM
17 Extending the likelihood
17.1 The quasilikelihood
17.2 Example: Wedderburn’s leaf blotch data
17.3 Generalized additive models
18 Clustered data
18.1 Generalization from individual to clustered data
18.2 Pooled estimators
18.3 Fixed effects
18.3.1 Unconditional fixed-effects estimators
18.3.2 Conditional fixed-effects estimators
18.4 Random effects
18.4.1 Maximum likelihood estimation
18.4.2 Gibbs sampling
18.5 GEEs
18.6 Other models
VII Stata Software
19 Programs for Stata
19.1 The glm command
19.1.1 Syntax
19.1.2 Description
19.1.3 Options
19.2 The predict command after glm
19.2.1 Syntax
19.2.2 Options
19.3 User-written programs
19.3.1 Global macros available for user-written programs
19.3.2 User-written variance functions
19.3.3 User-written programs for link functions
19.3.4 User-written programs for Newey–West weights
19.4 Remarks
19.4.1 Equivalent comments
19.4.2 Special comments on family(Gaussian) models
19.4.3 Special comments on family(binomial) models
19.4.4 Special comments on family(nbinomial) models
19.4.5 Special comments on family(gamma) link(log) models
A Tables
References
Author index
Subject index
|