**Preface**

**Acknowledgments**

**About the Authors**

**1 Basic Concepts for Statistical Modeling**

1.1 Introduction

1.2 Parameter Versus Statistic

1.3 Probability Definition

1.4 Conditional Probability

1.5 Concepts of Prevalence and Incidence

1.6 Random Variables

1.7 Probability Distributions

1.8 Centrality and Dispersion Parameters of a Random Variable

1.9 Independence and Dependence of Random Variables

1.10 Special Probability Distributions

1.10.1 Binomial Distribution

1.10.2 Poisson Distribution

1.10.3 Normal Distribution

1.11 Hypothesis Testing

1.12 Confidence Intervals

1.13 Clinical Significance Versus Statistical Significance

1.14 Data Management

1.14.1 Study Design

1.14.2 Data Collection

1.14.3 Data Entry

1.14.4 Data Screening

1.14.5 What to Do When Detecting a Data Issue

1.14.6 Impact of Data Issues and How to Proceed

1.15 Concept of Causality

References

**2 Introduction to Simple Linear Regression Models**

2.1 Introduction

2.2 Specific Objectives

2.3 Model Definition

2.4 Model Assumptions

2.5 Graphic Representation

2.6 Geometry of the Simple Regression Model

2.7 Estimation of Parameters

2.8 Variance of Estimators

2.9 Hypothesis Testing About the Slope of the Regression Line

2.9.1 Using the Student's *t*-Distribution

2.9.2 Using ANOVA

2.10 Coefficient of Determination

*R*^{2}
2.11 Pearson Correlation Coefficient

2.12 Estimation of Regression Line Values and Prediction

2.12.1 Confidence Interval for the Regression Line

2.12.2 Prediction Interval of Actual Values of the Response

2.13 Example

2.14 Predictions

2.14.1 Predictions with the Database Used by the Model

2.14.2 Predictions with Data Not Used to Create the Model

2.14.3 Residual Analysis

2.15 Conclusions

Practice Exercise

References

**3 Matrix Representation of the Linear Regression Model**

3.1 Introduction

3.2 Specific Objectives

3.3 Definition

3.3.1 Matrix

3.4 Matrix Representation of a SLRM

3.5 Matrix Arithmetic

3.5.1 Addition and Subtraction of Matrices

3.6 Matrix Multiplication

3.7 Special Matrices

3.8 Linear Dependence

3.9 Rank of a Matrix

3.10 Inverse Matrix [A

^{–1}]

3.11 Application of an Inverse Matrix in a SLRM

3.12 Estimation of

*β* Parameters in a SLRM

3.13 Multiple Linear Regression Model (MLRM)

3.14 Interpretation of the Coefficients in a MLRM

3.15 ANOVA in a MLRM

3.16 Using Indicator Variables

*(Dummy Variables)*
3.17 Polynomial Regression Models

3.18 Centering

3.19 Multicollinearity

3.20 Interaction Terms

3.21 Conclusion

Practice Exercise

References

**4 Evaluation of Partial Tests of Hypotheses in a MLRM**

4.1 Introduction

4.2 Specific Objectives

4.3 Definition of Partial Hypothesis

4.4 Evaluation Process of Partial Hypotheses

4.5 Special Cases

4.6 Examples

4.7 Conclusion

Practice Exercise

References

**5 Selection of Variables in a Multiple Linear Regression Model**

5.1 Introduction

5.2 Specific Objectives

5.3 Selection of Variables According to the Study Objectives

5.4 Criteria for Selecting the Best Regression Model

5.4.1 Coefficient of Determination, *R*^{2}

5.4.2 Adjusted Coefficient of Determination, *R*^{2}_{A}

5.4.3 Mean Square Error (*MSE*)

5.4.4 Mallows's *C*_{p}

5.4.5 Akaike Information Criterion

5.4.6 Bayesian Information Criterion

5.4.7 All Possible Models

5.5

*Stepwise* Method in Regression

5.5.1 Forward Selection

5.5.2 Backward Elimination

5.5.3 Stepwise Selection

5.6 Limitations of

*Stepwise* Methods

5.7 Conclusion

Practice Exercise

References

**6 Correlation Analysis**

6.1 Introduction

6.2 Specific Objectives

6.3 Main Correlation Coefficients Based on SLRM

6.3.1 Pearson Correlation Coefficient ρ

6.3.2 Relationship Between *r* and *β*_{1}

6.4 Major Correlation Coefficients Based on MLRM

6.4.1 Pearson Correlation Coefficient of Zero Order

6.4.2 Multiple Correlation Coefficient

6.5 Partial Correlation Coefficient

6.5.1 Partial Correlation Coefficient of the First Order

6.5.2 Partial Correlation Coefficient of the Second Order

6.5.3 Semipartial Correlation Coefficient

6.6 Significance Tests

6.7 Suggested Correlations

6.8 Example

6.9 Conclusion

Practice Exercise

References

**7 Strategies for Assessing the Adequacy of the Linear Regression Model**

7.1 Introduction

7.2 Specific Objectives

7.3 Residual Definition

7.4 Initial Exploration

7.5 Initial Considerations

7.6 Standardized Residual

7.7 Jackknife Residuals (R-Student Residuals)

7.8 Normality of the Errors

7.9 Correlation of Errors

7.10 Criteria for Detecting *Outliers, Leverage, and Influential Points*

7.11 Leverage Values

7.12 Cook's Distance

7.13 COV RATIO

7.14 DFBETAS

7.15 DFFITS

7.16 Summary of the Results

7.17 Multicollinearity

7.18 Transformation of Variables

7.19 Conclusion

Practice Exercise

References

**8 Weighted Least-Squares Linear Regression**

8.1 Introduction

8.2 Specific Objectives

8.3 Regression Model with Transformation into the Original Scale of

*Y*
8.4 Matrix Notation of the Weighted Linear Regression Model

8.5 Application of the WLS Model with Unequal Number of Subjects

8.5.1 Design without Intercept

8.5.2 Model with Intercept and Weighting Factor

8.6 Applications of the WLS Model When Variance Increases

8.6.1 First Alternative

8.6.2 Second Alternative

8.7 Conclusions

Practice Exercise

References

**9 Generalized Linear Models**

9.1 Introduction

9.2 Specific Objectives

9.3 Exponential Family of Probability Distributions

9.3.1 Binomial Distribution

9.3.2 Poisson Distribution

9.4 Exponential Family of Probability Distributions with Dispersion

9.5 Mean and Variance in EF and EDF

9.6 Definition of a Generalized Linear Model

9.7 Estimation Methods

9.8

*Deviance* Calculation

9.9 Hypothesis Evaluation

9.10 Analysis of Residuals

9.11 Model Selection

9.12 Bayesian Models

9.13 Conclusions

References

**10 Poisson Regression Models for Cohort Studies**

10.1 Introduction

10.2 Specific Objectives

10.3 Incidence Measures

10.3.1 Incidence Density

10.3.2 Cumulative Incidence

10.4 Confounding Variable

10.5 Stratified Analysis

10.6 Poisson Regression Model

10.7 Definition of Adjusted Relative Risk

10.8 Interaction Assessment

10.9 Relative Risk Estimation

10.10 Implementation of the Poisson Regression Model

10.11 Conclusion

Practice Exercise

References

**11 Logistic Regression in Case–Control Studies**

11.1 Introduction

11.2 Specific Objectives

11.3 Graphical Representation

11.4 Definition of the Odds Ratio

11.5 Confounding Assessment

11.6 Effect Modification

11.7 Stratified Analysis

11.8 Unconditional Logistic Regression Model

11.9 Types of Logistic Regression Models

11.9.1 Binary Case

11.9.2 Binomial Case

11.10 Computing the OR

_{crude}
11.11 Computing the Adjusted OR

11.12 Inference on

*OR*
11.13 Example of the Application of ULR Model: Binomial Case

11.14 Conditional Logistic Regression Model

11.15 Conclusions

Practice Exercise

References

**12 Regression Models in a Cross-Sectional Study**

12.1 Introduction

12.2 Specific Objectives

12.3 Prevalence Estimation Using the Normal Approach

12.4 Definition of the Magnitude of the Association

12.5 POR Estimation

12.5.1 Woolf's Method

12.5.2 Exact Method

12.6 Prevalence Ratio

12.7 Stratified Analysis

12.8 Logistic Regression Model

12.8.1 Modeling Prevalence Odds Ratio

12.8.2 Modeling Prevalence Ratio

12.9 Conclusions

Practice Exercise

References

**13 Solutions to Practice Exercises**

Chapter 2 Practice Exercise

Chapter 3 Practice Exercise

Chapter 4 Practice Exercise

Chapter 5 Practice Exercise

Chapter 6 Practice Exercise

Chapter 7 Practice Exercise

Chapter 8 Practice Exercise

Chapter 10 Practice Exercise

Chapter 11 Practice Exercise

Chapter 12 Practice Exercise

**Index**