>> Home >> Bookstore >> Categorical and limited dependent variables >> Multivariable Model-Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables

Multivariable Model-Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables

Patrick Royston and Willi Sauerbrei
Publisher: Wiley
Copyright: 2008
ISBN-13: 978-0-470-02842-1
Pages: 322; hardcover
Price: $99.00
Supplements:Errata, data, and other materials

Comment from the Stata technical group

Selecting the appropriate model from among a large class of candidate models is a difficult process: one must balance the (sometimes contradictory) goals of model interpretability, parsimony, good prediction properties, robustness to minor variations in the data, and applicability to other data. This text presents a well-rounded, practical approach to model selection, with its bulk devoted to general variable selection through the use of stepwise procedures (or otherwise) and the selection of functional forms for continuous variables. Regarding the selection of functional forms, the authors pay much attention to fractional polynomials and splines, drawing on their vast research in these areas. In particular, those looking for a tutorial on the use of fractional polynomials will find this text very useful. The methods prescribed can be applied widely, yet the examples used are primarily from the health sciences, with the typically used models being logistic regression, Cox regression, and generalized linear models.

Table of contents

1 Introduction
1.1 Real-Life Problems as Motivation for Model Building
1.1.1 Many Candidate Models
1.1.2 Functional Form for Continuous Predictors
1.1.3 Example 1: Continuous Response
1.1.4 Example 2: Multivariable Model for Survival Data
1.2 Issues in Modelling Continuous Predictors
1.2.1 Effects of Assumptions
1.2.2 Global versus Local Influence Models
1.2.3 Disadvantages of Fractional Polynomial Modelling
1.2.4 Controlling Model Complexity
1.3 Types of Regression Model Considered
1.3.1 Normal-Errors Regression
1.3.2 Logistic Regression
1.3.3 Cox Regression
1.3.4 Generalized Linear Models
1.3.5 Linear and Additive Predictors
1.4 Role of Residuals
1.4.1 Uses of Residuals
1.4.2 Graphical Analysis of Residuals
1.5 Role of Subject-Matter Knowledge in Model Development
1.6 Scope of Model Building in our Book
1.7 Modelling Preferences
1.7.1 General Issues
1.7.2 Criteria for a Good Model
1.7.3 Personal Preferences
1.8 General Notation
2 Selection of Variables
2.1 Introduction
2.2 Background
2.3 Preliminaries for a Multivariable Analysis
2.4 Aims of Multivariable Models
2.5 Prediction: Summary Statistics and Comparisons
2.6 Procedures for Selecting Variables
2.6.1 Strength of Predictors
2.6.2 Stepwise Procedures
2.6.3 All-Subsets Model Selection Using Information Criteria
2.6.4 Further Considerations
2.7 Comparison of Selection Strategies in Examples
2.7.1 Myeloma Study
2.7.2 Educational Body-Fat Data
2.7.3 Glioma Study
2.8 Selection and Shrinkage
2.8.1 Selection Bias
2.8.2 Simulation Study
2.8.3 Shrinkage to Correct for Selection Bias
2.8.4 Post-estimation Shrinkage
2.8.5 Reducing Selection Bias
2.8.6 Example
2.9 Discussion
2.9.1 Model Building in Small Datasets
2.9.2 Full, Pre-specified or Selected Model?
2.9.3 Comparison of Selection Procedures
2.9.4 Complexity, Stability and Interpretability
2.9.5 Conclusions and Outlook
Handling Categorical and Continuous Predictors
3.1 Introduction
3.2 Types of Predictor
3.2.1 Binary
3.2.2 Nominal
3.2.3 Ordinal, Counting, Continuous
3.2.4 Derived
3.3 Handling Ordinal Predictors
3.3.1 Coding Schemes
3.3.2 Effect of Coding Schemes on Variable Selection
3.4 Handling Counting and Continuous Predictors: Categorization
3.4.1 ‘Optimal’ Cutpoints: A Dangerous Analysis
3.4.2 Other Ways of Choosing a Cutpoint
3.5 Example: Issues in Model Building with Categorized Variables
3.5.1 One Ordinal Variable
3.5.2 Several Ordinal Variables
3.6 Handling Counting and Continuous Predictors: Functional Form
3.6.1 Beyond Linearity
3.6.2 Does Nonlinearity Matter?
3.6.3 Simple versus Complex Functions
3.6.4 Interpretability and Transportability
3.7 Empirical Curve Fitting
3.7.1 General Approaches to Smoothing
3.7.2 Critique of Local and Global Influence Models
3.8 Discussion
3.8.1 Sparse Categories
3.8.2 Choice of Coding Scheme
3.8.3 Categorizing Continuous Variables
3.8.4 Handling Continuous Variables
4 Fractional Polynomials for One Variable
4.1 Introduction
4.2 Background
4.2.1 Genesis
4.2.2 Types of Model
4.2.3 Relation to Box–Tidwell and Exponential Functions
4.3 Definition and Notation
4.3.1 Fractional Polynomials
4.3.2 First Derivative
4.4 Characteristics
4.4.1 FP1 and FP2 Functions
4.4.2 Maximum or Minimum of a FP2 Function
4.5 Examples of Curve Shapes with FP1 and FP2 Functions
4.6 Choice of Powers
4.7 Choice of Origin
4.8 Model Fitting and Estimation
4.9 Inference
4.9.1 Hypothesis Testing
4.9.2 Interval Estimation
4.10 Function Selection Procedure
4.10.1 Choice of Default Function
4.10.2 Closed Test Procedure for Function Selection
4.10.3 Example
4.10.4 Sequential Procedure
4.10.5 Type I Error and Power of the Function Selection Procedure
4.11 Scaling and Centering
4.11.1 Computational Aspects
4.11.2 Examples
4.12 FP Powers as Approximations to Continuous Powers
4.12.1 Box–Tidwell and Fractional Polynomial Models
4.12.2 Example
4.13 Presentation of Fractional Polynomial Functions
4.13.1 Graphical
4.13.2 Tabular
4.14 Worked Example
4.14.1 Details of all Fractional Polynomial Models
4.14.2 Function Selection
4.14.3 Details of the Fitted Model
4.14.4 Standard Error of a Fitted Value
4.14.5 Fitted Odds Ratio and its Confidence Interval
4.15 Modelling Covariates with a Spike at Zero
4.16 Power of Fractional Polynomial Analysis
4.16.1 Underlying Function Linear
4.16.2 Underlying Function FP1 or FP2
4.16.3 Comment
4.17 Discussion
5 Some Issues with Univariate Fractional Polynomial Models
5.1 Introduction
5.2 Susceptibility to Influential Covariate Observations
5.3 A Diagnostic Plot for Influential Points in FP Models
5.3.1 Example 1: Educational Body-Fat Data
5.3.2 Example 2: Primary Biliary Cirrhosis Data
5.4 Dependence on Choice of Origin
5.5 Improving Robustness by Preliminary Transformation
5.5.1 Example 1: Educational Body-Fat Data
5.5.2 Example 2: PBC Data
5.5.3 Practical Use of the Pre-transformation gδ(x)
5.6 Improving Fit by Preliminary Transformation
5.6.1 Lack of Fit of Fractional Polynomial Models
5.6.2 Negative Exponential Pre-transformation
5.7 Higher Order Fractional Polynomials
5.7.1 Example 1: Nerve Conduction Data
5.7.2 Example 2: Triceps Skinfold Thickness
5.8 When Fractional Polynomial Models are Unsuitable
5.8.1 Not all Curves are Fractional Polynomials
5.8.2 Example: Kidney Cancer
5.9 Discussion
6 MFP: Multivariable Model-building with Fractional Polynomials
6.1 Introduction
6.2 Motivation
6.3 The MFP Algorithm
6.3.1 Remarks
6.3.2 Example
6.4 Presenting the Model
6.4.1 Parameter Estimates
6.4.2 Function Plots
6.4.3 Effect Estimates
6.5 Model Criticism
6.5.1 Function Plots
6.5.2 Graphical Analysis of Residuals
6.5.3 Assessing Fit by Adding More Complex Functions
6.5.4 Consistency with Subject-Matter Knowledge
6.6 Further Topics
6.6.1 Interval Estimation
6.6.2 Importance of the Nominal Significance Level
6.6.3 The Full MFP Model
6.6.4 A Single Predictor of Interest
6.6.5 Contribution of Individual Variables to the Model Fit
6.6.6 Predictive Value of Additional Variables
6.7 Further Examples
6.7.1 Example 1: Oral Cancer
6.7.2 Example 2: Diabetes
6.7.3 Example 3: Whitehall I
6.8 Simple Versus Complex Fractional Polynomial Models
6.8.1 Complexity and Modelling Aims
6.8.2 Example: GBSG Breast Cancer Data
6.9 Discussion
6.9.1 Philosophy of MFP
6.9.2 Function Complexity, Sample Size and Subject-Matter Knowledge
6.9.3 Improving Robustness by Preliminary Covariate Transformation
6.9.4 Conclusion and Future
7 Interactions
7.1 Introduction
7.2 Background
7.3 General Considerations
7.3.1 Effect of Type of Predictor
7.3.2 Power
7.3.3 Randomized Trials and Observational Studies
7.3.4 Predefined Hypothesis or Hypothesis Generation
7.3.5 Interactions Caused by Mismodelling Main Effects
7.3.6 The ‘Treatment–Effect’ Plot
7.3.7 Graphical Checks, Sensitivity and Stability Analyses
7.3.8 Cautious Interpretation is Essential
7.4 The MFPI Procedure
7.4.1 Model Simplifications
7.4.2 Check of the Results and Sensitivity Analysis
7.5 Example 1: Advanced Prostate Cancer
7.5.1 The Fitted Model
7.5.2 Check of the Interactions
7.5.3 Final Model
7.5.4 Further Comments and Interpretation
7.5.5 FP Model Simplification
7.6 Example 2: GBSG Breast Cancer Study
7.6.1 Oestrogen Receptor Positivity as a Predictive Factor
7.6.2 A Predefined Hypothesis: Tamoxifen–Oestrogen Receptor Interaction
7.7 Categorization
7.7.1 Interaction with Categorized Variables
7.7.2 Example: GBSG Study
7.9 Example 3: Comparison of STEPP with MFPI
7.9.1 Interaction in the Kidney Cancer Data
7.9.2 Stability Investigation
7.10 Comment on Type I Error of MFPI
7.11 Continuous-by-Continuous Interactions
7.11.1 Mismodelling May Induce Interaction
7.11.2 MFPIgen: An FP Procedure to Investigate Interactions
7.11.3 Examples of MFPIgen
7.11.4 Graphical Presentation of Continuous-by-Continuous Interactions
7.11.5 Summary
7.12 Multi-Category Variables
7.13 Discussion
Model Stability
8.1 Introduction
8.2 Background
8.3 Using the Bootstrap to Explore Model Stability
8.3.1 Selection of Variables Within a Bootstrap Sample
8.3.2 The Bootstrap Inclusion Frequency and the Importance of a Variable
8.4 Example 1: Glioma Data
8.5 Example 2: Educational Body-Fat Data
8.5.1 Effect of Influential Observations on Model Selection
8.6 Example 3: Breast Cancer Diagnosis
8.7 Model Stability for Functions
8.7.1 Summarizing Variation between Curves
8.7.2 Measures of Curve Instability
8.8 Example 4: GBSG Breast Cancer Data
8.8.1 Interdependencies among Selected Variables and Functions in Subsets
8.8.2 Plots of Functions
8.8.3 Instability Measures
8.8.4 Stability of Functions Depending on Other Variables Included
8.9 Discussion
8.9.1 Relationship between Inclusion Fractions
8.9.2 Stability of Functions
9 Some Comparisons of MFP with Splines
9.1 Introduction
9.2 Background
9.3 MVRS: A Procedure for Model Building with Regression Splines
9.3.1 Restricted Cubic Spline Functions
9.3.2 Function Selection Procedure for Restricted Cubic Splines
9.3.3 The MVRS Algorithm
9.4 MVSS: A Procedure for Model Building with Cubic Smoothing Splines
9.4.1 Cubic Smoothing Splines
9.4.2 Function Selection Procedure for Cubic Smoothing Splines
9.4.3 The MVSS Algorithm
9.5 Example 1: Boston Housing Data
9.5.1 Effect of Reducing the Sample Size
9.5.2 Comparing Predictors
9.6 Example 2: GBSG Breast Cancer Study
9.7 Example 3: Pima Indians
9.8 Example 4: PBC
9.9 Discussion
9.9.1 Splines in General
9.9.2 Complexity of Functions
9.9.3 Optimal Fit or Transferability?
9.9.4 Reporting of Selected Models
9.9.5 Conclusion
10 How to Work with MFP
10.1 Introduction
10.2 The Dataset
10.3 Univariate Analyses
10.4 MFP Analysis
10.5 Model Criticism
10.5.1 Function Plots
10.5.2 Residuals and Lack of Fit
10.5.3 Robustness Transformation and Subject-Matter Knowledge
10.5.4 Diagnostic Plot for Influential Observations
10.5.5 Refined Model
10.5.6 Interactions
10.6 Stability Analysis
10.7 Final Model
10.8 Issues to be Aware of
10.8.1 Selecting the Main-Effects Model
10.8.2 Further Comments on Stability
10.8.3 Searching for Interactions
10.9 Discussion
11 Special Topics Involving Fractional Polynomials
11.1 Time-Varying Hazard Ratios in the Cox Model
11.1.1 The Fractional Polynomial Time Procedure
11.1.2 The MFP Time Procedure
11.1.3 Prognostic Model with Time-Varying Effects for Patients with Breast Cancer
11.1.4 Categorization of Survival Time
11.1.5 Discussion
11.2 Age-specific Reference Intervals
11.2.1 Example: Fetal Growth
11.2.2 Using FP Functions as Smoothers
11.2.3 More Sophisticated Distributional Assumptions
11.2.4 Discussion
11.3 Other Topics
11.3.1 Quantitative Risk Assessment in Developmental Toxicity Studies
11.3.2 Model Uncertainty for Functions
11.3.3 Relative Survival
11.3.4 Approximating Smooth Functions
11.3.5 Miscellaneous Applications
12 Epilogue
12.1 Introduction
12.2 Towards Recommendations for Practice
12.2.1 Variable Selection Procedure
12.2.2 Functional Form for Continuous Covariates
12.2.3 Extreme Values or Influential Points
12.2.4 Sensitivity Analysis
12.2.5 Check for Model Stability
12.2.6 Complexity of a Predictor
12.2.7 Check for Interactions
12.3 Omitted Topics and Future Directions
12.3.1 Measurement Error in Covariates
12.3.2 Meta-analysis
12.3.3 Multi-level (Hierarchical) Models
12.3.4 Missing Covariate Data
12.3.5 Other Types of Model
12.4 Conclusion
Appendix A: Data and Software Resources
A.1 Summaries of Datasets
A.2 Datasets used more than once
A.2.1 Research Body Fat
A.2.2 GBSG Breast Cancer
A.2.3 Educational Body Fat
A.2.4 Glioma
A.2.5 Prostate Cancer
A.2.6 Whitehall I
A.2.7 PBC
A.2.8 Oral Cancer
A.2.9 Kidney Cancer
A.3 Software
Appendix B: Glossary of Abbreviations
The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube