Stata Bookstore: Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Second Edition

Home / Bookstore / Title index / Biostatistics and epidemiology / Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Second Edition

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Second Edition

Click to enlarge
See the back cover

Buy from Amazon

As an Amazon Associate, StataCorp earns a small referral credit from qualifying purchases made from affiliate links on our site.

Amazon Associate affiliate link

What are VitalSource eBooks?
Your access code will be emailed upon purchase.

eBook not available for this title

Authors:	Eric Vittinghoff, David V. Glidden, Stephen C. Shiboski, and Charles E. McCulloch
Publisher:	Springer
Copyright:	2012
ISBN-13:	978-1-4614-1352-3
Pages:	509; hardcover

Authors:	Eric Vittinghoff, David V. Glidden, Stephen C. Shiboski, and Charles E. McCulloch
Publisher:	Springer
Copyright:	2012
ISBN-13:
Pages:	509; eBook
Price:	$0.00

Authors:	Eric Vittinghoff, David V. Glidden, Stephen C. Shiboski, and Charles E. McCulloch
Publisher:	Springer
Copyright:	2012
ISBN-13:
Pages:	509; Kindle
Price:	$

Comment from the Stata technical group

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Second Edition is intended as a teaching text for a one-semester or two-quarter secondary statistics course in biostatistics. The book's focus is multipredictor regression models in modern medical research. The authors recommend as a prerequisite an introductory course in statistics or biostatistics, but the first three chapters provide sufficient review material to make this requirement not critical.

Vittinghoff, Glidden, Shiboski, and McCulloch take a unified approach to regression models. They begin with linear regression and then discuss issues such as model statement and assumptions, types of regressors (for example, categorical versus continuous), interactions, causation and confounding, inference and testing, diagnostics, and alternative models for when assumptions are violated. Then they discuss these same issues in the contexts of other multipredictor regression models, namely, logistic regression, the Cox model, and generalized linear models (GLMs). The authors then cover generalized estimating equations (GEE) and the analysis of survey data. Almost all analyses are performed using Stata.

The second edition provides two new chapters and substantially expands some of the existing chapters. Specifically, a new chapter on strengthening causal inference describes the fundamentals of causal inference and concentrates on two estimation methods—inverse probability weighting and what the authors call potential outcomes estimation. This chapter also covers propensity scores, time-dependent treatments, instrumental variables, and principal stratification. The other new chapter is on missing data. The authors describe the missing-data problem and its impact on statistical inference. They then discuss three approaches for handling missing data: maximum likelihood estimation, multiple imputation, and inverse weighting. Among the substantially revised chapters are chapters on logistic regression, now including categorical outcomes; on survival analysis, now including competing risks; on generalized linear models, now including negative binomial and zero-truncated and zero-inflated count models; and more. All the Stata examples used in the book have been updated for Stata 12.

View table of contents >>

Preface

1. Introduction

1.1 Example: Treatment of Back Pain
1.2 The Family of Multipredictor Regression Methods
1.3 Motivation for Multipredictor Regression

1.3.1 Prediction
1.3.2 Isolating the Effect of a Single Predictor
1.3.3 Understanding Multiple Predictors

1.4 Guide to the Book

2. Exploratory and Descriptive Methods

2.1 Data Checking
2.2 Types of Data
2.3 One-Variable Descriptions

2.3.1 Numerical Variables
2.3.2 Categorical Variables

2.4 Two-Variable Descriptions

2.4.1 Outcome Versus Predictor Variables
2.4.2 Continuous Outcome Variable
2.4.3 Categorical Outcome Variable

2.5 Multivariable Descriptions
2.6 Summary
2.7 Problems

3. Basic Statistical Methods

3.1 t-Test and Analysis of Variance

3.1.1 t-Test
3.1.2 One- and Two-Sided Hypothesis Test
3.1.3 Paired t-Test
3.1.4 One-Way Analysis of Variance
3.1.5 Pairwise Comparisons in ANOVA
3.1.6 Multi-way ANOVA and ANCOVA
3.1.7 Robustness to Violations of Normality Assumption
3.1.8 Nonparametric Alternatives
3.1.9 Equal Variance Assumption

3.2 Correlation Coefficient

3.2.1 Spearman Rank Correlation Coefficient
3.2.2 Kendall's τ

3.3 Simple Linear Regression Model

3.3.1 Systematic Part of the Model
3.3.2 Random Part of the Model
3.3.3 Assumptions About the Predictor
3.3.4 Ordinary Least Squares Estimation
3.3.5 Fitted Values and Residuals
3.3.6 Sums of Squares
3.3.7 Standard Errors of the Regression Coefficients
3.3.8 Hypothesis Tests and Confidence Intervals
3.3.9 Slope, Correlation Coefficient, and R²

3.4 Contingency Table Methods for Binary Outcomes

3.4.1 Measures of Risk and Association for Binary Outcomes
3.4.2 Tests of Association in Contingency Tables
3.4.3 Predictors with Multiple Categories
3.4.4 Analyses Involving Multiple Categorical Predictors
3.4.5 Collapsibility of Standard Measures of Association

3.5 Basic Methods for Survival Analysis

3.5.1 Right Censoring
3.5.2 Kaplan–Meier Estimator of the Survival Function
3.5.3 Interpretation of Kaplan–Meier Curves
3.5.4 Median Survival
3.5.5 Cumulative Event Function
3.5.6 Comparing Groups Using the Logrank Test

3.6 Bootstrap Confidence Intervals
3.7 Interpretation of Negative Findings
3.8 Further Notes and References
3.9 Problems
3.10 Learning objectives

4. Linear Regression

4.1 Example: Exercise and Glucose
4.2 Multiple Linear Regression Model

4.2.1 Systematic Part of the Model
4.2.2 Random Part of the Model
4.2.3 Generalization of R² and r
4.2.4 Standardized Regression Coefficients

4.3 Categorical Predictors

4.3.1 Binary Predictors
4.3.2 Multilevel Categorical Predictors
4.3.3 The F-Test
4.3.4 Multiple Pairwise Comparisons Between Categories
4.3.5 Testing for Trend Across Categories

4.4 Confounding

4.4.1 Range of Confounding Patterns
4.4.2 Confounding Is Difficult to Rule Out
4.4.3 Adjusted Versus Unadjusted βs
4.4.4 Example: BMI and LDL

4.5 Mediation

4.5.1 Indirect Effects via the Mediator
4.5.2 Overall and Direct Effects
4.5.3 Percent Explained
4.5.4 Example: BMI, Exercise, and Glucose
4.5.5 Pitfalls in Evaluating Mediation

4.6 Interaction

4.6.1 Example: Hormone Therapy and Statin Use
4.6.2 Example: BMI and Statin Use
4.6.3 Interaction and Scale
4.6.4 Example: Hormone Therapy and Baseline LDL
4.6.5 Details

4.7 Checking Model Assumptions and Fit

4.7.1 Linearity
4.7.2 Normality
4.7.3 Constant Variance
4.7.4 Outlying, High Leverage, and Influential Points
4.7.5 Interpretation of Results for Log Transformed Variables
4.7.6 When to Use Transformations

4.8 Sample Size, Power, and Detectable Effects

4.8.1 Calculations Using Standard Errors Based on Published Data

4.9 Summary
4.10 Further Notes and References

4.10.1 Generalized Additive Models

4.11 Problems
4.12 Learning Objectives

5. Logistic Regression

5.1 Single Predictor Models

5.1.1 Interpretation of Regression Coefficients
5.1.2 Categorical Predictors

5.2 Multipredictor Models

5.2.1 Likelihood Ratio Tests
5.2.2 Confounding
5.2.3 Mediation
5.2.4 Interaction
5.2.5 Prediction
5.2.6 Prediction Accuracy

5.3 Case–Control Studies

5.3.1 Matched Case–Control Studies

5.4 Checking Model Assumptions and Fit

5.4.1 Linearity
5.4.2 Outlying and Influential Points
5.4.3 Model Adequacy
5.4.4 Technical Issues in Logistic Model Fitting

5.5 Alternative Strategies for Binary Outcomes

5.5.1 Infectious Disease Transmission Models
5.5.2 Pooled Logistic Regression
5.5.3 Regression Models Based on Risk Differences and Relative Risks
5.5.4 Exact Logistic Regression
5.5.5 Nonparametric Binary Regression
5.5.6 More Than Two Outcome Levels

5.6 Likelihood
5.7 Sample Size, Power, and Detectable Effects
5.8 Summary
5.9 Further Notes and References
5.10 Problems
5.11 Learning Objectives

6. Survival Analysis

6.1 Survival Data

6.1.1 Why Linear and Logistic Regression Would not Work
6.1.2 Hazard Function
6.1.3 Hazard Ratio
6.1.4 Proportional Hazards Assumption

6.2 Cox Proportional Hazards Models

6.2.1 Proportional Hazards Models
6.2.2 Parametric Versus Semi-Parametric Models
6.2.3 Hazard Ratios, Risk, and Survival Times
6.2.4 Hypothesis Tests and Confidence Intervals
6.2.5 Binary Predictors
6.2.6 Multilevel Categorical Predictors
6.2.7 Continuous Predictors
6.2.8 Confounding
6.2.9 Mediation
6.2.10 Interaction
6.2.11 Model Building
6.2.12 Adjusted Survival Curves for Comparing Groups
6.2.13 Predicted Survival for Specific Covariate Patterns

6.3 Extensions to the Cox Model

6.3.1 Time-Dependent Covariates
6.3.2 Stratified Cox Model

6.4 Checking Model Assumptions and Fit

6.4.1 Log-Linearity of the Hazard Function
6.4.2 Proportional Hazards

6.5 Competing Risks Data

6.5.1 What Are Competing Risks Data?
6.5.2 Notation for Competing Risks Data
6.5.3 Summaries for Competing Risks Data

6.6 Some Details

6.6.1 Bootstrap Confidence Intervals
6.6.2 Prediction
6.6.3 Adjusting for Nonconfounding Covariates
6.6.4 Independent Censoring
6.6.5 Interval Censoring
6.6.6 Left-Truncation

6.7 Sample Size, Power, and Detectable Effects
6.8 Summary
6.9 Further Notes and References
6.10 Problems
6.11 Learning Objectives

7. Repeated Measures and Longitudinal Data Analysis

7.1 A Simple Repeated Measures Example: Fecal Fat

7.1.1 Model Equations for the Fecal Fat Example
7.1.2 Correlations Within Subjects
7.1.3 Estimates of the Effects of Pill Type

7.2 Hierarchical Data

7.2.1 Example: Treatment of Back Pain
7.2.2 Example: Physician Profiling
7.2.3 Analysis Strategies for Hierarchical Data

7.3 Longitudinal Data

7.3.1 Analysis Strategies for Longitudinal Data
7.3.2 Analyzing Change Scores

7.4 Generalized Estimating Equations

7.4.1 Example: Birthweight and Birth Order Revisited
7.4.2 Correlation Structures
7.4.3 Working Correlation and Robust Standard Errors
7.4.4 Tests and Confidence Intervals
7.4.5 Use of xtgee for Clustered Logistic Regression

7.5 Random Effects Models
7.6 Re-Analysis of the Georgia Babies Data Set
7.7 Analysis of the SOF BMD Data

7.7.1 Time Varying Predictors
7.7.2 Separating Between- and Within-Cluster Information
7.7.3 Prediction
7.7.4 A Logistic Analysis

7.8 Marginal Versus Conditional Models
7.9 Example: Cardiac Injury Following Brain Hemorrhage

7.9.1 Bootstrap Analysis

7.10 Power and Sample Size for Repeated Measures Designs

7.10.1 Between-Cluster Predictor
7.10.2 Within-Cluster Predictor

7.11 Summary
7.12 Further Notes and References

7.12.1 Missing Data
7.12.2 Computing

7.13 Problems
7.14 Learning Objectives

8. Generalized Linear Models

8.1 Example: Treatment for Depression

8.1.1 Statistical Issues
8.1.2 Model for the Mean Response
8.1.3 Choice of Distribution
8.1.4 Interpreting the Parameters
8.1.5 Further Notes

8.2 Example: Costs of Phototherapy

8.2.1 Model for the Mean Response
8.2.2 Choice of Distribution
8.2.3 Interpreting the Parameters

8.3 Generalized Linear Models

8.3.1 Example: Risky Drug Use Behavior
8.3.2 Modeling Data with Many Zeros
8.3.3 Example: A Randomized Trial to Reduce Risk of Fracture
8.3.4 Relationship of Mean to Variance
8.3.5 Non-Linear Models

8.4 Sample Size for the Poisson Model
8.5 Summary
8.6 Further Notes and References
8.7 Problems
8.8 Learning Objectives

9. Strengthening Causal Inference

9.1 Potential Outcomes and Causal Effects

9.1.1 Average Causal Effects
9.1.2 Marginal Structural Model
9.1.3 Fundamental Problem of Causal Inference
9.1.4 Randomization Assumption
9.1.5 Conditional Independence
9.1.6 Marginal and Conditional Means
9.1.7 Potential Outcomes Estimation
9.1.8 Inverse Probability Weighting

9.2 Regression as a Basis for Causal Inference

9.2.1 No Unmeasured Confounders
9.2.2 Correct Model Specification
9.2.3 Overlap and the Positivity Assumption
9.2.4 Lack of Overlap and Model Misspecification
9.2.5 Adequate Sample Size and Number of Events
9.2.6 Example: Phototherapy for Neonatal Jaundice

9.3 Marginal Effects and Potential Outcomes Estimation

9.3.1 Marginal and Conditional Effects
9.3.2 Contrasting Conditional and Marginal Effects
9.3.3 When Marginal and Conditional Odds-Ratios Differ
9.3.4 Potential Outcomes Estimation
9.3.5 Marginal Effects in Longitudinal Data

9.4 Propensity Scores

9.4.1 Estimation of Propensity Scores
9.4.2 Effect Estimation Using Propensity Scores
9.4.3 Inverse Probability Weights
9.4.4 Checking for Propensity Score/Exposure Interaction
9.4.5 Addressing Positivity Violations Using Restriction
9.4.6 Average Treatment Effect in the Treated (ATT)
9.4.7 Recommendations for Using Propensity Scores

9.5 Time-Dependent Treatments

9.5.1 Models Using Time-Dependent IP Weights
9.5.2 Implementation
9.5.3 Drawbacks and Difficulties
9.5.4 Focusing of New Users
9.5.5 Nested New-User Cohorts

9.6 Mediation
9.7 Instrumental Variables

9.7.1 Vulnerabilities
9.7.2 Structural Equations and Instrumental Variables
9.7.3 Checking IV Assumptions
9.7.4 Example: Effect of Hormone Therapy on Change in LDL
9.7.5 Extension to Binary Exposures and Outcomes
9.7.6 Example: Phototherapy for Neonatal Jaundice
9.7.7 Interpretation of IV Estimates

9.8 Trials with Incomplete Adherence to Treatment

9.8.1 Intention-to-Treat
9.8.2 As-Treated Comparisons by Treatment Received
9.8.3 Instrumental Variables
9.8.4 Principal Stratification

9.9 Summary
9.10 Further Notes and References
9.11 Problems
9.12 Learning Objectives

10. Predictor Selection

10.1 Prediction

10.1.1 Bias–Variance Trade-off and Overfitting
10.1.2 Measures of Prediction Error
10.1.3 Optimism-Corrected Estimates of Prediction Error
10.1.4 Minimizing Prediction Error Without Overfitting
10.1.5 Point Scores
10.1.6 Example: Risk Stratification of Patients with Heart Disease

10.2 Evaluating a Predictor of Primary Interest

10.2.1 Including Predictors for Face Validity
10.2.2 Selecting Predictors on Statistical Grounds
10.2.3 Interactions With the Predictor of Primary Interest
10.2.4 Example: Incontinence as a Risk Factor for Falling
10.2.5 Directed Acyclic Graphs
10.2.6 Randomized Experiments

10.3 Identifying Multiple Important Predictors

10.3.1 Ruling Out Confounding Is Still Central
10.3.2 Cautious Interpretation Is Also Key
10.3.3 Example: Risk Factors for Coronary Heart Disease
10.3.4 Allen–Cady Modified Backward Selection

10.4 Some Details

10.4.1 Collinearity
10.4.2 Number of Predictors
10.4.3 Alternatives to Backward Selection
10.4.4 Model Selection and Checking
10.4.5 Model Selection Complicates Inference

10.5 Summary
10.6 Further Notes and References
10.7 Problems
10.8 Learning Objectives

11. Missing Data

11.1 Why Missing Data Can Be a Problem

11.1.1 Missing Predictor in Linear Regression
11.1.2 Missing Outcome in Longitudinal Data

11.2 Classifications of Missing Data

11.2.1 Mechanisms for Missing Data

11.3 Simple Approaches to Handling Missing Data

11.3.1 Include a Missing Data Category
11.3.2 Last Observation or Baseline Carried Forward

11.4 Methods for Handling Missing Data
11.5 Missing Data in the Predictors and Multiple Imputation

11.5.1 Remarks About Using Multiple Imputation
11.5.2 Approaches to Multiple Imputation
11.5.3 Multiple Imputation for HERS

11.6 Deciding Which Missing Data Mechanism May Be Applicable
11.7 Missing Outcomes, Missing Completely at Random
11.8 Missing Outcomes, Covariate-Dependent Missing Completely at Random
11.9 Missing Outcomes for Longitudinal Studies, Missing at Random

11.9.1 ML and MAR
11.9.2 Multiple Imputation
11.9.3 Inverse Probability Weighting

11.10 Technical Details About Maximum Likelihood and Data Which Are Missing at Random

11.10.1 An Example of the EM Algorithm
10.10.2 The EM Algorithm Imputes the Missing Data
10.10.3 ML Versus MI with Missing Outcomes

11.11 Methods for Data that Are Missing Not at Random

11.11.1 Pattern Mixture Models
11.11.2 Multiple Imputation Under MNAR
11.11.3 Joint Modeling of Outcomes and the Dropout Process

11.12 Summary
11.13 Further Notes and References
11.14 Problems
11.15 Learning Objectives

12. Complex Surveys

12.1 Overview of Complex Survey Designs
12.2 Inverse Probability Weighting

12.2.1 Accounting for Inverse Probability Weights in the Analysis
12.2.2 Inverse Probability Weights and Missing Data

12.3 Clustering and Stratification

12.3.1 Design Effects

12.4 Example: Diabetes in NHANES
12.5 Some Details

12.5.1 Ignoring Secondary Levels of Clustering
12.5.2 Other Methods of Variance Estimation
12.5.3 Model Checking
12.5.4 Postestimation Capabilities in Stata
12.5.5 Other Statistical Packages for Complex Surveys

12.6 Summary
12.7 Further Notes and References
12.8 Problems
12.9 Learning Objectives

13. Summary

13.1 Introduction
13.2 Selecting Appropriate Statistical Methods
13.3 Planning and Executing a Data Analysis

13.3.1 Analysis Plans
13.3.2 Choice of Software
13.3.3 Data Preparation
13.3.4 Record Keeping and Reproducibility of Results
13.3.5 Data Security
13.3.6 Consulting a Statistician
13.3.7 Use of Internet Resources

13.4 Further Notes and References

13.4.1 Multiple Hypothesis Tests
13.4.2 Statistical Learning

References

Index

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Second Edition

Comment from the Stata technical group

Table of contents

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models, Second Edition

Comment from the Stata technical group

Table of contents

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies