Preface

Acknowledgments

About the Author

I The Basics

1 Why Use Regression Models?

1.1 Why Use Simple Regression Models?

1.2 Why Use Multiple Regression Models?

1.3 Some Basic Notation

2 An Introductory Example

2.1 A Single Line Model

2.2 Fitting a Single Line Model

2.3 Taking Uncertainty into Account

2.4 A Two-Line Model

2.5 How to Perform These Steps with Stata

2.6 Exercise *5-HIAA and Serotonin*

2.7 Exercise *Haemoglobin*

2.8 Exercise *Scaling of Variables*

3 The Classical Multiple Regression Model

4 Adjusted Effects

4.1 Adjusting for Confounding

4.2 Adjusting for Imbalances

4.3 Exercise *Physical Activity in Schoolchildren*

5 Inference for the Classical Multiple Regression Model

5.1 The Traditional and the Modern Way of Inference

5.2 How to Perform the Modern Way of Inference with Stata

5.3 How Valid and Good are Least Squares Estimates?

5.4 A Note on the Use and Interpretation of p-Values in Regression Analyses

6 Logistic Regression

6.1 The Definition of the Logistic Regression Model

6.2 Analysing a Dose Response Experiment by Logistic Regression

6.3 How to Fit a Dose Response Model with Stata

6.4 Estimating Odds Ratios and Adjusted Odds Ratios Using Logistic Regression

6.5 How to Compute (Adjusted) Odds Ratios Using Logistic Regression in Stata

6.6 Exercise *Allergy in Children*

6.7 More on Logit Scale and Odds Scale

7 Inference for the Logistic Regression Model

7.1 The Maximum Likelihood Principle

7.2 Properties of the ML Estimates for Logistic Regression

7.3 Inference for a Single Regression Parameter

7.4 How to Perform Wald Tests and Likelihood Ratio Tests in Stata

8 Categorical Covariates

8.1 Incorporating Categorical Covariates in a Regression Model

8.2 Some Technicalities in Using Categorical Covariates

8.3 Testing the Effect of a Categorical Covariate

8.4 The Handling of Categorical Covariates in Stata

8.5 Presenting Results of a Regression Analysis Involving Categorical
Covariates in a Table

8.6 Exercise *Physical Occupation and Back Pain*

8.7 Exercise *Odds Ratios and Categorical Covariates*

9 Handling Ordered Categories: A First Lesson in Regression Modelling Strategies

10 The Cox Proportional Hazards Model

10.1 Modelling the Risk of Dying

10.2 Modelling the Risk of Dying in Continuous Time

10.3 Using the Cox Proportional Hazards Model to Quantify the Difference
in Survival Between Groups

10.4 How to Fit a Cox Proportional Hazards Model with Stata

10.5 Exercise *Prognostic Factors in Breast Cancer Patients—Part 1
*

11 Common Pitfalls in Using Regression Models

11.1 Association versus Causation

11.2 Difference between Subjects versus Difference within Subjects

11.3 Real-World Models versus Statistical Models

11.4 Relevance versus Significance

11.5 Exercise *Prognostic Factors in Breast Cancer Patients—
Part 2*

II Advanced Topics and Techniques

12 Some Useful Technicalities

12.1 Illustrating Models by Using Model-Based Predictions

12.2 How to Work with Predictions in Stata

12.3 Residuals and the Standard Deviation of the Error Term

12.4 Working with Residuals and the RMSE in Stata

12.5 Linear and Nonlinear Functions of Regression Parameters

12.6 Transformations of Regression Parameters

12.7 Centering of Covariate Values

12.8 Exercise *Paternal Smoking versus Maternal Smoking*

13 Comparing Regression Coefficients

13.1 Comparing Regression Coefficients among Continuous Covariates

13.2 Comparing Regression Coefficients among Binary Covariates

13.3 Measuring the Impact of Changing Covariate Values

13.4 Translating Regression Coefficients

13.5 How to Compare Regression Coefficients in Stata

13.6 Exercise *Health in Young People*

14 Power and Sample Size

14.1 The Power of a Regression Analysis

14.2 Determinants of Power in Regression Models with a Single Covariate

14.3 Determinants of Power in Regression Models with Several Covariates

14.4 Power and Sample Size Calculations When a Sample from the Covariate
Distribution Is Given

14.5 Power and Sample Size Calculations Given a Sample from the Covariate
Distribution with Stata

14.6 The Choice of the Values of the Regression Parameters in a Simulation
Study

14.7 Simulating a Covariate Distribution

14.8 Simulating a Covariate Distribution with Stata

14.9 Choosing the Parameters to Simulate a Covariate Distribution

14.10 Necessary Sample Sizes to Justify Asymptotic Methods

14.11 Exercise *Power Considerations for a Study on Neck Pain*

14.12 Exercise *Choosing between Two Outcomes*

15 Selection of the Sample

15.1 Selection in Dependence on the Covariates

15.2 Selection in Dependence on the Outcome

15.3 Sampling in Dependence on Covariate Values

16 Selection of Covariates

16.1 Fitting Regression Models with Correlated Covariates

16.2 The “Adjustment versus Power” Dilemma

16.3 The “Adjustment Makes Effects Small” Dilemma

16.4 Adjusting for Mediators

16.5 Adjusting for Confounding — A Useful Academic Game

16.6 Adjusting for Correlated Confounders

16.7 Including Predictive Covariates

16.8 Automatic Variable Selection

16.9 How to Choose Relevant Sets of Covariates

16.10 Preparing the Selection of Covariates: Analysing the Association
Among Covariates

16.11 Preparing the Selection of Covariates: Univariate Analyses?

16.12 Exercise *Vocabulary Size in Young Children—Part 1*

16.13 Preprocessing of the Covariate Space

16.14 How to Preprocess the Covariate Space with Stata

16.15 Exercise *Vocabulary Size in Young Children— Part 2*

16.16 What Is a Confounder?

17 Modelling Nonlinear Effects

17.1 Quadratic Regression

17.2 Polynomial Regression

17.3 Splines

17.4 Fractional Polynomials

17.5 Gain in Power by Modelling Nonlinear Effects?

17.6 Demonstrating the Effect of a Covariate

17.7 Demonstrating a Nonlinear Effect

17.8 Describing the Shape of a Nonlinear Effect

17.9 Detecting Nonlinearity by Analysis of Residuals

17.10 Judging of Nonlinearity May Require Adjustment

17.11 How to Model Nonlinear Effects in Stata

17.12 The Impact of Ignoring Nonlinearity

17.13 Modelling the Nonlinear Effect of Confounders

17.14 Nonlinear Models

17.15 Exercise *Serum Makers for AMI*

18 Transformation of Covariates

18.1 Transformations to Obtain a Linear Relationship

18.2 Transformation of Skewed Covariates

18.3 To Categorise or Not to Categorise

19 Effect Modification and Interactions

19.1 Modelling Effect Modification

19.2 Adjusted Effect Modifications

19.3 Interactions

19.4 Modelling Effect Modifications in Several Covariates

19.5 The Effect of a Covariate in the Presence of Interactions

19.6 Interactions as Deviations from Additivity

19.7 Scales and Interactions

19.8 Ceiling Effects and Interactions

19.9 Hunting for Interactions

19.10 How to Analyse Effect Modification and Interactions with Stata

19.11 Exercise *Treatment Interactions in a Randomised Clinical Trial
for the Treatment of Malignant Glioma*

20 Applying Regression Models to Clustered Data

20.1 Why Clustered Data Can Invalidate Inference

20.2 Robust Standard Errors

20.3 Improving the Efficiency

20.4 Within- and Between-Cluster Effects

20.5 Some Unusual but Useful Usages of Robust Standard Errors in Clustered
Data

20.6 How to Take Clustering into Account in Stata

21 Applying Regression Models to Longitudinal Data

21.1 Analysing Time Trends in the Outcome

21.2 Analysing Time Trends in the Effect of Covariates

21.3 Analysing the Effect of Covariates

21.4 Analysing Individual Variation in Time Trends

21.5 Analysing Summary Measures

21.6 Analysing the Effect of Change

21.7 How to Perform Regression Modelling of Longitudinal Data in Stata

21.8 Exercise *Increase of Body Fat in Adolescents*

22 The Impact of Measurement Error

22.1 The Impact of Systematic and Random Measurement Error

22.2 The Impact of Misclassification

22.3 The Impact of Measurement Error in Confounders

22.4 The Impact of Differential Misclassification and Measurement Error

22.5 Studying the Measurement Error

22.6 Exercise *Measurement Error and Interactions*

23 The Impact of Incomplete Covariate Data

23.1 Missing Value Mechanisms

23.2 Properties of a Complete Case Analysis

23.3 Bias Due to Using ad hoc Methods

23.4 Advanced Techniques to Handle Incomplete Covariate Data

23.5 Handling of Partially Defined Covariates

III Risk Scores and Predictors

24 Risk Scores

24.1 What Is a Risk Score?

24.2 Judging the Usefulness of a Risk Score

24.3 The Precision of Risk Score Values

24.4 The Overall Precision of a Risk Score

24.5 Using Stata’s **predict** Command to Compute Risk Scores

24.6 Categorisation of Risk Scores

24.7 Exercise *Computing Risk Scores for Breast Cancer Patients*

25 Construction of Predictors

25.1 From Risk Scores to Predictors

25.2 Predictions and Prediction Intervals for a Continuous Outcome

25.3 Predictions for a Binary Outcome

25.4 Construction of Predictions for Time-to-Event Data

25.5 How to Construct Predictions with Stata

25.6 The Overall Precision of a Predictor

26 Evaluating the Predictive Performance

26.1 The Predictive Performance of an Existing Predictor

26.2 How to Assess the Predictive Performance of an Existing Predictor in
Stata

26.3 Estimating the Predictive Performance of a New Predictor

26.4 How to Assess the Predictive Performance via Cross-Validation in Stata

26.5 Exercise *Assessing the Predictive Performance of a Prognostic Score
in Breast Cancer Patients*

27 Outlook: Construction of Parsimonious Predictors

IV Miscellaneous

28 Alterations to Regression Modelling

28.1 Stratification

28.2 Measures of Association: Correlation Coefficients

28.3 Measures of Association: The Odds Ratio

28.4 Propensity Scores

28.5 Classification and Regression Trees

29 Specific Regression Models

29.1 Probit Regression for Binary Outcomes

29.2 Generalised Linear Models

29.3 Regression Models for Count Data

29.4 Regression Models for Ordinal Outcome Data

29.5 Quantile Regression and Robust Regression

29.6 ANOVA and Regression

30 Specific Usages of Regression Models

30.1 Logistic Regression for the Analysis of Case-Control Studies

30.2 Logistic Regression for the Analysis of Matched Case-Control Studies

30.3 Adjusting for Baseline Values in Randomised Clinical Trials

30.4 Assessing Predictive Factors

30.5 Incorporating Time-Varying Covariates in a Cox Model

30.6 Time-Dependent Effects in a Cox Model

30.7 Using the Cox Model in the Presence of Competing Risks

30.8 Using the Cox Model to Analyse Multi-State Models

31 What Is a Good Model?

31.1 Does the Model Fit the Data?

31.2 How Good Are Predictions?

31.3 Explained Variation

31.4 Goodness of Fit

31.5 Model Stability

31.6 The Usefulness of a Model

32 Final Remarks on the Role of Prespecified Models and Model Development

V Mathematical Details

A Mathematics Behind the Classical Linear Regression Model

A.1 Computing Regression Parameters in Simple Linear Regression

A.2 Computing Regression Parameters in the Classical Multiple Regression
Model

A.3 Estimation of the Standard Error

A.4 Construction of Confidence Intervals and p-Values

B Mathematics Behind the Logistic Regression Model

B.1 The Least Squares Principle as a Maximum Likelihood Principle

B.2 Maximising the Likelihood of a Logistic Regression Model

B.3 Estimating the Standard Error of the ML Estimates

B.4 Testing Composite Hypotheses

C The Modern Way of Inference

C.1 Robust Estimation of Standard Errors

C.2 Robust Estimation of Standard Errors in the Presence of Clustering

D Mathematics for Risk Scores and Predictors

D.1 Computing Individual Survival Probabilities after Fitting a Cox Model

D.2 Standard Errors for Risk Scores

D.3 The Delta Rule

Bibliography

Index