Preface to the first edition

Preface

1 Introduction

1.1 Background

1.1.1 The problem of looking at data

1.1.2 Theory as pattern

1.1.3 Model fitting

1.1.4 What is a good model?

1.2 The origins of generalized linear models

1.2.1 Terminology

1.2.2 Classical linear models

1.2.3 R. A. Fisher and the design of experiments

1.2.4 Dilution assay

1.2.5 Probit analysis

1.2.6 Logit models for proportions

1.2.7 Log-linear models for counts

1.2.8 Inverse polynomials

1.2.9 Survival data

1.3 Scope of the rest of the book

1.4 Bibliographic notes

1.5 Further results and exercises 1

2 An outline of generalized linear models

2.1 Processes in model fitting

2.1.1 Model selection

2.1.2 Estimation

2.1.3 Prediction

2.2 The components of a generalized linear model

2.2.1 The generalization

2.2.2 Likelihood functions

2.2.3 Link functions

2.2.4 Sufficient statistics and canonical links

2.3 Measuring the goodness of fit

2.3.1 The discrepancy of a fit

2.3.2 The analysis of deviance

2.4 Residuals

2.4.1 Pearson residual

2.4.2 Anscombe residual

2.4.3 Deviance residual

2.5 An algorithm for fitting generalized linear models

2.5.1 Justification of the fitting procedure

2.6 Bibliographic notes

2.7 Further results and exercises 2

3 Models for continuous data with constant variance

3.1 Introduction

3.2 Error structure

3.3 Systematic component (linear predictor)

3.3.1 Continuous covariates

3.3.2 Qualitative covariates

3.3.3 Dummy variates

3.3.4 Mixed terms

3.4 Model formulae for linear predictors

3.4.1 Individual terms

3.4.2 The dot operator

3.4.3 The + operator

3.4.4 The crossing (*) and nesting (/) operators

3.4.5 Operators for the removal of terms

3.4.6 Exponential operator

3.5 Aliasing

3.5.1 Intrinsic aliasing with factors

3.5.2 Aliasing in a two-way cross-classification

3.5.3 Extrinsic aliasing

3.5.4 Functional relations among covariates

3.6 Estimation

3.6.1 The maximum-likelihood equations

3.6.2 Geometrical interpretation

3.6.3 Information

3.6.4 A model with two covariates

3.6.5 The information surface

3.6.6 Stability

3.7 Tables as data

3.7.1 Empty cells

3.7.2 Fused cells

3.8 Algorithms for least squares

3.8.1 Methods based on the information matrix

3.8.2 Direct decomposition methods

3.8.3 Extension to generalized linear models

3.9 Section of covariates

3.10 Bibliographic notes

3.11 Further results and exercises 3

4 Binary data

4.1 Introduction

4.1.1 Binary responses

4.1.2 Covariate classes

4.1.3 Contingency tables

4.2 Binomial distribution

4.2.1 Genesis

4.2.2 Moments and cumulants

4.2.3 Normal limit

4.2.4 Poisson limit

4.2.5 Transformations

4.3 Models for binary responses

4.3.1 Link functions

4.3.2 Parameter interpretation

4.3.3 Retrospective sampling

4.4 Likelihood functions for binary data

4.4.1 Log likelihood for binomial data

4.4.2 Parameter estimation

4.4.3 Deviance function

4.4.4 Bias and precision of estimates

4.4.5 Sparseness

4.4.6 Extrapolation

4.5 Over-dispersion

4.5.1 Genesis

4.5.2 Parameter estimation

4.6 Example

4.6.1 Habitat preferences of lizards

4.7 Bibliographic notes

4.8 Further results and exercises 4

5 Models for polytomous data

5.1 Introduction

5.2 Measurement scales

5.2.1 General points

5.2.2 Models for ordinal scales

5.2.3 Models for interval scales

5.2.4 Models for nominal scales

5.2.5 Nested or hierarchical response scales

5.3 The multinomial distribution

5.3.1 Genesis

5.3.2 Moments and cumulants

5.3.3 Generalized inverse and matrices

5.3.4 Quadratic forms

5.3.5 Marginal and conditional distributions

5.4 Likelihood functions

5.4.1 Log likelihood for multinomial responses

5.4.2 Parameter estimation

5.4.3 Deviance function

5.5 Over-dispersion

5.6 Examples

5.6.1 A cheese-tasting experiment

5.6.2 Pneumoconiosis among coalminers

5.7 Bibliographic notes

5.8 Further results and exercises 5

6 Log-linear models

6.1 Introduction

6.2 Likelihood functions

6.2.1 Poisson distribution

6.2.2 The Poisson log-likelihood function

6.2.3 Over-dispersion

6.2.4 Asymptotic theory

6.3 Examples

6.3.1 A biological assay of tuberculins

6.3.2 A study of wave damage to cargo ships

6.4 Log-linear models and multinomial response models

6.4.1 Comparison of two or more Poisson means

6.4.2 Multinomial response models

6.4.3 Summary

6.5 Multiple responses

6.5.1 Introduction

6.5.2 Independence and conditional independence

6.5.3 Canonical correlation models

6.5.4 Multivariate regression models

6.5.5 Multivariate model formulae

6.5.6 Log-linear regression models

6.5.7 Likelihood equations

6.6 Example

6.6.1 Respiratory ailments of coalminers

6.6.2 Parameter interpretation

6.7 Bibliographic notes

6.8 Further results and exercises 6

7 Conditional likelihoods*

7.1 Introduction

7.2 Marginal and conditional likelihoods

7.2.1 Marginal likelihood

7.2.2 Conditional likelihood

7.2.3 Exponential-family models

7.2.4 Profile likelihood

7.3 Hypergeometric distributions

7.3.1 Central hypergeometric distribution

7.3.2 Non-central hypergeometric distribution

7.3.3 Multivariate hypergeometric distribution

7.3.4 Multivariate non-central distribution

7.4 Some applications involving binary data

7.4.1 Comparison of two binomial probabilities

7.4.2 Combination of information from 2x2 tables

7.4.3 Ille-et-Vilaine study of oesophageal cancer

7.5 Some applications involving polytomous data

7.5.1 Matched pairs: nominal response

7.5.2 Ordinal responses

7.5.3 Example

7.6 Bibliographic notes

7.7 Further results and exercises 7

8 Models with constant coefficient of variation

8.1 Introduction

8.2 The gamma distribution

8.3 Models with gamma-distributed observations

8.3.1 The variance function

8.3.2 The deviance

8.3.3 The canonical link

8.3.4 Multiplicative models: log link

8.3.5 Linear models: identity link

8.3.6 Estimation of the dispersion parameter

8.4 Examples

8.4.1 Car insurance claims

8.4.2 Clotting times of blood

8.4.3 Modelling rainfall data using two generalized linear models

8.4.4 Developmental rate of Drosophila melanogaster

8.5 Bibliographic notes

8.6 Further results and exercises 8

9 Quasi-likelihood functions

9.1 Introduction

9.2 Independent observations

9.2.1 Covariance functions

9.2.2 Construction of the quasi-likelihood function

9.2.3 Parameter estimation

9.2.4 Example: incidence of leaf-blotch on barley

9.3 Dependent observations

9.3.1 Quasi-likelihood estimating equations

9.3.2 Quasi-likelihood function

9.3.3 Example: estimation of probabilities from marginal frequencies

9.4 Optimal estimating functions

9.4.1 Introduction

9.4.2 Combination of estimating functions

9.4.3 Example: estimation for megalithic stone rings

9.5 Optimality criteria

9.6 Extended quasi-likelihood

9.7 Bibliographic notes

9.8 Further results and exercises 9

10 Joint modelling of mean and dispersion

10.1 Introduction

10.2 Model specification

10.3 Interaction between mean and dispersion effects

10.4 Extended quasi-likelihood as a criterion

10.5 Adjustments of the estimating equations

10.5.1 Adjustment for kurtosis

10.5.2 Adjustment for degrees of freedom

10.5.3 Summary of estimating equations for the dispersion model

10.6 Joint optimum estimating equations

10.7 Example: the production of leaf-springs for trucks

10.8 Bibliographic notes

10.9 Further results and exercises 10

11 Models with additional non-linear parameters

11.1 Introduction

11.2 Parameters in the variance function

11.3 Parameters in the link function

11.3.1 One link parameter

11.3.2 More than one link parameter

11.3.3 Transformation of data vs transformation of fitted values

11.4 Non-linear parameters in the covariates

11.5 Examples

11.5.1 The effects of fertilizers on coastal Bermuda grass

11.5.2 Assay of an insecticide with a synergist

11.5.3 Mixtures of drugs

11.6 Bibliographic notes

11.7 Further results and exercises 11

12 Model checking

12.1 Introduction

12.2 Techniques in model checking

12.3 Score tests for extra parameters

12.4 Smoothing as an aid to informal checks

12.5 The raw materials of model checking

12.6 Checks for systematic departure from model

12.6.1 Informal checks using residuals

12.6.2 Checking the variance function

12.6.3 Checking the link function

12.6.4 Checking the scales of covariates

12.6.5 Checks for compound discrepancies

12.7 Checks for isolated departures from the model

12.7.1 Measure of leverage

12.7.2 Measure of consistency

12.7.3 Measure of influence

12.7.4 Informal assessment of extreme values

12.7.5 Extreme points and checks for systematic discrepancies

12.8 Examples

12.8.1 Carrot damage in an insecticide experiment

12.8.2 Minitab tree data

12.8.3 Insurance claims (continued)

12.9 A strategy for model checking?

12.10 Bibliographic notes

12.11 Further results and exercises 12

13 Models for survival data

13.1 Introduction

13.1.1 Survival functions and hazard functions

13.2 Proportional-hazards models

13.3 Estimation with a specified survival distribution

13.3.1 The exponential distribution

13.3.2 The Weibull distribution

13.3.3 The extreme-value distribution

13.4 Example: remission times for leukaemia

13.5 Cox's proportional-hazards model

13.5.1 Partial likelihood

13.5.2 The treatment of ties

13.5.3 Numerical methods

13.6 Bibliographic notes

13.7 Further results and exercises 13

14 Components of dispersion

14.1 Introduction

14.2 Linear models

14.3 Non-linear models

14.4 Parameter estimation

14.5 Example: A salamander mating experiment

14.5.1 Introduction

14.5.2 Experimental procedure

14.5.3 A linear logistic model with random effects

14.5.4 Estimation of the dispersion parameters

14.6 Bibliographic notes

14.7 Further results and exercises 14

15 Further topics

15.1 Introduction

15.2 Bias adjustment

15.2.1 Models with canonical link

15.2.2 Non-canonical models

15.2.3 Example: Lizard data (continued)

15.3 Computation of Bartlett adjustments

15.3.1 General theory

15.3.2 Computation of the adjustment

15.3.3 Example: exponential regression model

15.4 Generalized additive models

15.4.1 Algorithms for fitting

15.4.2 Smoothing methods

15.4.3 Conclusions

15.5 Bibliographic notes

15.6 Further results and exercises 15

Appendices

A Elementary likelihood theory

B Edgeworth series

C Likelihood-ratio statistics

References

Index of data sets

Author index

Subject index