This book is intended for use as a teaching text for a one-semester or
two-quarter secondary statistics course in biostatistics and focuses on
multipredictor regression models in modern medical research. It lists as a
prerequisite an introductory course in statistics or biostatistics, but the
first three chapters provide sufficient review material to make this
requirement not critical.
The authors take a unified approach to regression models. They begin with
linear regression and then discuss issues such as model statement and
assumptions, types of regressors (e.g., categorical vs. continuous),
interactions, causation and confounding, inference and testing, diagnostics,
and alternative models for when assumptions are violated. Then they discuss
these same issues in the contexts of other multipredictor regression models,
namely logistic regression, the Cox model, and generalized linear models
(GLMs). Chapters follow covering generalized estimating equations (GEE) and
the analysis of survey data. Almost all analyses are performed using Stata.
Preface
1. Introduction
1.1 Example: Treatment of back pain
1.2 The family of multipredictor regression methods
1.3 Motivation for multipredictor regression
1.3.1 Prediction
1.3.2 Isolating the effect of a single predictor
1.3.3 Understanding multiple predictors
1.4 Guide to the book
2. Exploratory and descriptive methods
2.1 Data checking
2.2 Types of data
2.3 One-variable descriptions
2.3.1 Numerical variables
2.3.2 Categorical variables
2.4 Two-variable descriptions
2.4.1 Outcome versus predictor variables
2.4.2 Continuous outcome variable
2.4.3 Categorical outcome variable
2.5 Multivariable descriptions
2.6 Problems
3. Basic statistical methods
3.1
t-test and analysis of variance
3.1.1 t-test
3.1.2 One- and two-sided hypothesis test
3.1.3 Paired t-test
3.1.4 One-way analysis of variance (ANOVA)
3.1.5 Pairwise comparisons in ANOVA
3.1.6 Multi-way ANOVA and ANCOVA
3.1.7 Robustness to violations of assumptions
3.2 Correlation coefficient
3.3 Simple linear regression model
3.3.1 Systematic part of the model
3.3.2 Random part of the model
3.3.3 Assumptions about the predictor
3.3.4 Ordinary least squares estimation
3.3.5 Fitted values and residuals
3.3.6 Sums of squares
3.3.7 Standard errors of the regression coefficients
3.3.8 Hypothesis tests and confidence intervals
3.3.9 Slope, correlation coefficient, and R2
3.4 Contingency table methods for binary outcomes
3.4.1 Measures of risk and association for binary outcomes
3.4.2 Tests of association in contingency tables
3.4.3 Predictors with multiple categories
3.4.4 Analysis involving multiple categorical predictors
3.5 Basic methods for survival analysis
3.5.1 Right censoring
3.5.2 Kaplan–Meier estimator of the survival function
3.5.3 Interpretation of Kaplan–Meier curves
3.5.4 Median survival
3.5.5 Cumulative incidence function
3.5.6 Comparing groups using the logrank test
3.6 Bootstrap confidence intervals
3.7 Interpretation of negative findings
3.8 Further notes and references
3.9 Problems
3.10 Learning objectives
4. Linear Regression
4.1 Example: Exercise and glucose
4.2 Multiple linear regression model
4.2.1 Systematic part of the model
4.2.2 Random part of the model
4.2.3 Generalization of R2 and r
4.2.4 Standardized regression coefficients
4.3 Categorical predictors
4.3.1 Binary predictors
4.3.2 Multilevel categorical predictors
4.3.3 The F-test
4.3.4 Multiple pairwise comparisons between categories
4.3.5 Testing for trend across categories
4.4 Confounding
4.4.1 Causal effects and counterfactuals
4.4.2 A linear model for the counterfactual experiment
4.4.3 Confounding of causal effects
4.4.4 Randomization assumption
4.4.5 Conditions for confounding of causal effects
4.4.6 Control of confounding
4.4.7 Range of confounding patterns
4.4.8 Diagnostics for confounding in a sampl
e
4.4.9 Confounding is difficult to rule out
4.4.10 Adjusted vs. unadjusted βs
4.4.11 Example: BMI and LDL
4.5 Mediation
4.5.1 Modeling mediation
4.5.2 Confidence intervals for measures of mediation
4.5.3 Example: BMI, exercise, and glucose
4.6 Interaction
4.6.1 Causal effects and interaction
4.6.2 Modeling interaction
4.6.3 Overall causal effect in the presence of interaction
4.6.4 Example: Hormone therapy and statin use
4.6.5 Example: BMI and statin use
4.6.7 Interaction and scale
4.6.8 Details
4.7 Checking model assumptions and fit
4.7.1 Linearity
4.7.2 Normality
4.7.3 Constant variance
4.7.4 Outlying, high leverage, and influential points
4.7.5 Interpretation of results for log-transformed variables
4.7.6 When to use transformations
4.8 Summary
4.9 Further notes and references
4.10 Problems
4.11 Learning objectives
5. Predictor selection
5.1 Diagramming and the hypothesized causal model
5.2 Prediction
5.2.1 Bias-variance trade-off
5.2.2 Estimating prediction error
5.2.3 Screening candidate models
5.2.4 Classification and regression trees (CART)
5.3 Evaluating a predictor of primary interest
5.3.1 Including predictors for face validity
5.3.2 Selecting predictors on statistical grounds
5.3.3 Interactions with the predictor of primary interest
5.3.4 Example: Incontinence as a risk factor for falling
5.3.5 Randomized experiments
5.4 Identifying multiple important predictors
5.4.1 Ruling out confounding is still central
5.4.2 Cautious interpretation is also key
5.4.3 Example: Risk factors for coronary heart disease
5.4.4 Allen–Cady modified backward selection
5.5 Some details
5.5.1 Collinearity
5.5.2 Number of predictors
5.5.3 Alternatives to backward selection
5.5.4 Model selection and checking
5.5.5 Model selection complicates inference
5.6 Summary
5.7 Further notes and references
5.8 Problems
5.9 Learning objectives
6. Logistic regression
6.1 Single predictor models
6.1.1 Interpretation of regression coefficients
6.1.2 Categorical predictors
6.2 Multipredictor models
6.2.1 Likelihood ratio tests
6.2.2 Confounding
6.2.3 Interaction
6.2.4 Prediction
6.2.5 Prediction accuracy
6.3 Case–control studies
6.3.1 Matched case–control studies
6.4 Checking models assumptions and fit
6.4.1 Outlying and influential points
6.4.2 Linearity
6.4.3 Model adequacy
6.4.4 Technical issues in logistic model fitting
6.5 Alternative strategies for binary outcomes
6.5.1 Infectious disease transmission models
6.5.2 Regression models based on excess and relative risks
6.5.3 Nonparametric binary regression
6.5.4 More than two outcome levels
6.6 Likelihood
6.7 Summary
6.8 Further notes and references
6.9 Problems
6.10 Learning objectives
7. Survival analysis
7.1 Survival data
7.1.1 Why linear and logistic regression won't work
7.1.2 Hazard function
7.1.3 Hazard ratio
7.1.4 Proportional hazards assumption
7.2 Cox proportional hazards models
7.2.1 Proportional hazards models
7.2.2 Parametric vs. semi-parametric models
7.2.3 Hazard ratios, risk, and survival times
7.2.4 Hypothesis tests and confidence intervals
7.2.5 Binary predictors
7.2.6 Multilevel categorical predictors
7.2.7 Continuous predictors
7.2.8 Confounding
7.2.9 Mediation
7.2.10 Interaction
7.2.11 Adjusted survival curves for comparing groups
7.2.12 Predicted survival for specific covariate patterns
7.3 Extensions to the Cox model
7.3.1 Time-dependent covariates
7.3.2 Stratified Cox model
7.4 Checking model assumptions and fit
7.4.1 Log-linearity
7.4.2 Proportional hazards
7.5 Some details
7.5.1 Bootstrap confidence intervals
7.5.2 Prediction
7.5.3 Adjusting for non-confounding covariates
7.5.4 Independent censoring
7.5.5 Interval censoring
7.5.6 Left truncation
7.6 Summary
7.7 Further notes and references
7.8 Problems
7.9 Learning objectives
8. Repeated measures analysis
8.1 A simple repeated measures example: fecal fat
8.1.1 Model equations for the fecal fat example
8.1.2 Correlations within subjects
8.1.3 Estimates of the effects of pill type
8.2 Hierarchical data
8.2.1 Analysis strategies for hierarchical data
8.3 Longitudinal data
8.3.1 Analysis strategies for longitudinal data
8.3.2 Example: Birthweight and birth order
8.3.3 When to use repeated measures analyses
8.4 Generalized Estimating Equations
8.4.1 Birthweight and birth order revisited
8.4.2 Correlation structures
8.4.3 Working correlation and robust standard errors
8.4.4 Hypothesis tests and confidence intervals
8.4.5 use of xtgee for clustered logistic regression
8.5 Random effects models
8.5.1 Re-analysis of birthweight and birth order
8.5.2 Prediction
8.5.3 Logistic model for low birthweight
8.5.4 Marginal versus conditional models
8.6 Example: Cardiac injury following brain hemorrhage
8.6.1 Bootstrap confidence intervals
8.7 Summary
8.8 Further notes and references
8.9 Problems
8.10 Learning objectives
9. Generalized linear models
9.1 Example: Treatment for depression
9.1.1 Statistical issues
9.1.2 Model for the mean response
9.1.3 Choice of distribution
9.1.4 Interpreting the parameters
9.1.5 Further notes
9.2 Example: Costs of Phototherapy
9.2.1 Model for the mean response
9.2.2 Choice of distribution
9.2.3 Interpreting the parameters
9.3 Generalized linear models
9.3.1 Example: Risky drug use behavior
9.3.2 Relationship of mean to variance
9.3.3 Nonlinear models
9.4 Summary
9.5 Further notes and references
9.6 Problems
9.7 Learning objectives
10. Complex surveys
10.1 Example: NHANES
10.2 Probability weights
10.3 Variance estimation
10.3.1 Design effects
10.3.2 Simplification of correlation structure
10.3.3 Other methods of variance estimation
10.4 Summary
10.5 Further notes and references
10.6 Problems
10.7 Learning objectives
11. Summary
11.1 Introduction
11.2 Selecting appropriate statistical methods
11.3 Planning and executing a data analysis
11.3.1 Analysis plans
11.3.2 Choice of software
11.3.3 Record keeping and organization
11.3.4 Data security
11.3.5 Consulting a statisticia
n
11.3.6 Use of internet resources
11.4 Further notes and references
References
Index