Epidemiology: Study Design and Data Analysis, Third Edition 

Click to enlarge See the back cover 
$84.75 Print Add to cartStata Press eBook detailsStata Press eBooks are read using VitalSource Bookshelf^{®} platform. Bookshelf is free and allows you to access your Stata Press eBook from your computer, smartphone, tablet, or eReader. After you place an order, an access code will be included in your order confirmation. Visit Bookshelf online to create an account and redeem your code. Bookshelf is available on the following:
Bookshelf allows you to have 2 computers and 2 mobile devices activated at any given time. Learn more about VitalSource Bookshelf » Return policy for eBooksStata Press eBooks are nonreturnable and nonrefundable. ×eBook not available for this title 


Comment from the Stata technical groupWoodward’s third edition of Epidemiology: Study Design and Data Analysis has two target audiences: researchers who need statistical solutions to epidemiology problems and statisticians who wish to learn how their science applies to epidemiology. This book successfully presents statistical principles in epidemiology in a manner that is neither too theoretical nor too replete with medical jargon. It provides complete treatment of the topic, from simple contingency tables to metaanalysis. The book uses real data throughout—more than 20 large datasets are cataloged for download—and the end of each chapter has exercises. Woodward makes Stata code for working many of the examples available for download. Topics include basic terminology, causality, descriptive statistics, testing of means, relative risks versus odds ratios, exact tests based on tables, tests for linear and nonlinear trends, confounding and interaction, direct and indirect standardization, cohort designs, case–control studies, intervention studies, power and sample size, linear models (including analysis of variance), logistic and other models for binary responses, survival analysis (including Cox regression), and metaanalysis. The third edition has been expanded to include risk scores and clinical decision rules, bootstrapping, multiple imputation, binomial regression models, competing risk, propensity scoring, and splines. 

Table of contentsView table of contents >> 1 Fundamental issues
1.1 What is epidemiology?
1.2 Case studies: The work of Doll and Hill 1.3 Populations and samples
1.3.1 Populations
1.4 Measuring disease 1.3.2 Samples
1.4.1 Incidence and prevalence
1.5 Measuring the risk factor 1.6 Causality
1.6.1 Association
1.7 Studies using routine data 1.6.2 Problems with establishing causality 1.6.3 Principles of causality
1.7.1 Ecological data
1.8 Study design 1.7.2 National sources of data on disease 1.7.3 National sources of data on risk factors 1.7.4 International data
1.8.1 Intervention studies
1.9 Data analysis 1.8.2 Observational studies Exercises 2 Basic analytical procedures
2.1 Introduction
2.1.1 Inferential procedures
2.2 Case study
2.2.1 The Scottish Heart Health Study
2.3 Types of variables
2.3.1 Qualitative variables
2.4 Tables and charts 2.3.2 Quantitative variables 2.3.3 The hierarchy of type
2.4.1 Tables in reports
2.5 Inferential techniques for categorical variables 2.4.2 Diagrams in reports
2.5.1 Contingency tables
2.6 Descriptive Techniques for quantitative variables 2.5.2 Binary variables: proportions and percentages 2.5.3 Comparing two proportions or percentages
2.6.1 The fivenumber summary
2.7 Inferences about means 2.6.2 Quantiles 2.6.3 The twonumber summary 2.6.4 Other summary statistics of spread 2.6.5 Assessing symmetry 2.6.6 Investigating shape
2.7.1 Checking normality
2.8 Inferential techniques for nonnormal data 2.7.2 Inferences for a single mean 2.7.3 Comparing two means 2.7.4 Paired data
2.8.1 Transformations
2.9 Measuring agreement 2.8.2 Nonparametric tests 2.8.3 Confidence intervals for medians
2.9.1 Quantitative variables
2.10 Assessing diagnostic tests 2.9.2 Categorical variables 2.9.3 Ordered categorical variables 2.9.4 Internal consistency
2.10.1 Accounting for sensitivity and specificity
Exercises 3 Assessing risk factors
3.1 Risk and relative risk
3.2 Odds and odds ratio 3.3 Relative risk or odds ratio? 3.4 Prevalence studies 3.5 Testing association
3.5.1 Equivalent tests
3.6 Risk factors measured at several levels 3.5.2 Onesided tests 3.5.3 Continuity corrections 3.5.4 Fisher's exact test 3.5.5 Limitations of tests
3.6.1 Continuous risk factors
3.7 Attributable risk 3.6.2 A test for linear trend 3.6.3 A test for nonlinearity 3.8 Rate and relative rate
3.8.1 The general epidemiological rate
3.9 Measures of difference 3.10 EPITAB commands in Stata Exercises 4 Confounding and interaction
4.1 Introduction
4.2 The concept of confounding 4.3 Identification of confounders
4.3.1 A strategy for selection
4.4 Assessing confounding
4.4.1 Using estimation
4.5 Standardisation 4.4.2 Using hypothesis tests 4.4.3 Dealing with several confounding variables
4.5.1 Direct standardisation of event rates
4.6 Mantel–Haenszel methods 4.5.2 Indirect standardisation of event rates 4.5.3 Standardisation of risks
4.6.1 The Mantel–Haenszel relative risk
4.7 The concept of interaction 4.6.2 The Cochran–Mantel–Haenszel test 4.6.3 Further comments 4.8 Testing for interaction
4.8.1 Using the relative risk
4.9 Dealing with interaction 4.8.2 Using the odds ratio 4.8.3 Using the risk difference 4.8.4 Which type of interaction to use? 4.8.5 Which interactions to test? 4.10 EPITAB commands in Stata Exercises 5 Cohort studies
5.1 Design considerations
5.1.1 Advantages
5.2 Analytical considerations 5.1.2 Disadvantages 5.1.3 Alternative designs with economic advantages 5.1.4 Studies with a single baseline sample
5.2.1 Concurrent followup
5.3 Cohort life tables 5.2.2 Moving baseline dates 5.2.3 Varying followup durations 5.2.4 Withdrawals
5.3.1 Allowing for sampling variation
5.4 KaplanMeier estimation 5.3.2 Allowing for censoring 5.3.3 Comparison of two life tables 5.3.4 Limitations
5.4.1 An empirical comparison
5.5 Comparison of two sets of survival probabilities
5.5.1 Mantel–Haenszel methods
5.6 Competing risk 5.5.2 The logrank test 5.5.3 Weighted logrank tests 5.5.4 Allowing for confounding variables 5.5.5 Comparing three of more groups 5.7 The personyears method
5.7.1 Agespecific rates
5.8 Periodcohort analysis 5.7.2 Summarisation of rates 5.7.3 Comparison of two SERs 5.7.4 Mantel–Haenszel methods 5.7.5 Further comments
5.8.1 Periodspecific rates
Exercises 6 Case–control studies
6.1 Basic design concepts
6.1.1 Advantages
6.2 Basic methods of analysis 6.1.2 Disadvantages
6.2.1 Dichotomous exposure
6.3 Selection of cases 6.2.2 Polytomous exposure 6.2.3 Confounding and interaction 6.2.4 Attributable risk
6.3.1 Definition
6.4 Selection of controls 6.3.2 Inclusion and exclusion criteria 6.3.3 Incident or prevalent? 6.3.4 Source 6.3.5 Consideration of bias
6.4.1 General principles
6.5 Matching 6.4.2 Hospital controls 6.4.3 Community controls 6.4.4 Other sources 6.4.5 How many?
6.5.1 Advantages
6.6 The analysis of matched studies 6.5.2 Disadvantages 6.5.3 Onetomany matching 6.5.4 Matching in other study designs
6.6.1 1 : 1 Matching
6.7 Nested case–control studies 6.6.2 1 : c Matching 6.6.3 1 : Variable matching 6.6.4 Many : many matching 6.6.5 A modelling approach
6.7.1 Matched studies
6.8 Casecohort studies 6.7.2 Countermatched studies 6.9 Casecrossover studies Exercises 7 Intervention studies
7.1 Introduction
7.1.1 Advantages
7.2 Ethical considerations 7.1.2 Disadvantages
7.2.1 The protocol
7.3 Avoidance of bias
7.3.1 Use of a control group
7.4 Parallel group studies 7.3.2 Blindness 7.3.3 Randomisation 7.3.4 Consent before randomisation 7.3.5 Analysis by intentiontotreat
7.4.1 Number needed to treat
7.5 Crossover studies 7.4.2 Cluster randomised trials 7.4.3 Stepped wedge trials 7.4.4 Noninferiority trials
7.5.1 Graphical analysis
7.6 Sequential studies 7.5.2 Comparing means 7.5.3 Analysing preferences 7.5.4 Analysing binary data
7.6.1 The HaybittlePeto stopping rule
7.7 Allocation to treatment group 7.6.2 Adaptive designs
7.7.1 Global randomisation
7.8 Trials as cohorts 7.7.2 Stratified randomization 7.7.3 Implementation Exercises 8 Sample size determination
8.1 Introduction
8.2 Power
8.2.1 Choice of alternative hypothesis
8.3 Testing a mean value
8.3.1 Common choices for power and significance level
8.4 Testing a difference between means 8.3.2 Using a table of sample sizes 8.3.3 The minimum detectable difference 8.3.4 The assumption of known standard deviation
8.4.1 Using a table of sample sizes
8.5 Testing a proportion 8.4.2 Power and minimum detectable difference 8.4.3 Optimum distribution of the sample 8.4.4 Paired data
8.5.1 Using a table of sample sizes
8.6 Testing a relative risk
8.6.1 Using a table of sample sizes
8.7 Case–control studies 8.6.2 Power and minimum detectable relative risk
8.7.1 Using a table of sample sizes
8.8 Complex sampling designs 8.7.2 Power and minimum detectable relative risk 8.7.3 Comparison with cohort studies 8.7.4 Matched studies 8.9 Concluding remarks Exercises 9 Modelling quantitative outcome variables
9.1 Statistical models
9.2 One categorical explanatory variable
9.2.1 The hypotheses to be tested
9.3 One quantitative explanatory variable 9.2.2 Construction of the ANOVA table 9.2.3 How the ANOVA table is used 9.2.4 Estimation of group means 9.2.5 Comparison of group means 9.2.6 Fitted values 9.2.7 Using computer packages
9.3.1 Simple linear regression
9.4 Two categorical explanatory variables 9.3.2 Correlation 9.3.3 Nonlinear regression
9.4.1 Model specification
9.5 Model building 9.4.2 Model fitting 9.4.3 Balanced data 9.4.4 Unbalanced data 9.4.5 Fitted values 9.4.6 Least squares means 9.4.7 Interaction 9.6 General linear models 9.7 Several explanatory variables
9.7.1 Information criteria
9.8 Model checking 9.7.2 Boosted regression 9.9 Confounding
9.9.1 Adjustment using residuals
9.10 Splines
9.10.1 Choice of knots
9.11 Panel data 9.10.2 Other types of splines 9.12 Nonnormal alternatives Exercises 10 Modelling binary outcome data
10.1 Introduction
10.2 Problems with standard regression models
10.2.1 The rx relationship may well not be linear
10.3 Logistic regression 10.2.2 Predicted values of the risk may be outside the valid range 10.2.3 The error distribution is not normal 10.4 Interpretation of logistic regression coefficients
10.4.1 Binary risk factors
10.5 Generic data 10.4.2 Quantitative risk factors 10.4.3 Categorical risk factors 10.4.4 Ordinal risk factors 10.4.5 Floating absolute risks 10.6 Multiple logistic regression models 10.7 Tests of hypotheses
10.7.1 Goodness of fit for grouped data
10.8 Confounding 10.7.2 Goodness of fit for generic data 10.7.3 Effect of a risk factor 10.7.4 Information criteria 10.7.5 Tests for linearity and nonlinearity 10.7.6 Tests based upon estimates and their standard errors 10.7.7 Problems with missing values 10.9 Interaction
10.9.1 Between two categorical variables
10.10 Dealing with a quantitative explanatory variable 10.9.2 Between a quantitative and categorical variable 10.9.3 Between two quantitative variables
10.10.1 Linear form
10.11 Model checking 10.10.2 Categorical form 10.10.3 Linear spline form 10.10.4 Generalisations
10.11.1 Residuals
10.12 Measurement error 10.11.2 Influential observations
10.12.1 Regression to the mean
10.13 Case–control studies 10.12.2 Correcting for regression dilution
10.13.1 Unmatched studies
10.14 Outcomes with several levels 10.13.2 Matched studies
10.14.1 The proportional odds assumption
10.15 Longitudinal data 10.14.2 The proportional odds model 10.14.3 Multinomial regression 10.16 Binomial regression
10.16.1 Adjusted risks
10.17 Propensity scoring 10.16.2 Risk differences 10.16.3 Problems with binomial models
10.17.1 Pairmatched propensity scores
Exercises 10.17.2 Stratified propensity scores 10.17.3 Weighting by the inverse propensity score 10.17.4 Adjusting for the propensity score 10.17.5 Deriving the propensity score 10.17.6 Propensity score outliers 10.17.7 Conduct of the matched design 10.17.8 Analysis of the matched design 10.17.9 Case studies 10.17.10 Interpretation of effects 10.17.11 Problems with estimating uncertainty 10.17.12 Propensity scores in practice 11 Modelling followup data
11.1 Introduction
11.1.1 Models for survival data
11.2 Basic functions of survival time
11.2.1 The survival function
11.3 Estimating the hazard function 11.2.2 The hazard function
11.3.1 Kaplan–Meier estimation
11.4 Probability models 11.3.2 Persontime estimation 11.3.3 Actuarial estimation 11.3.4 The cumulative hazard
11.4.1 The probability density and cumulative distribution functions
11.5 Proportional hazards regression models 11.4.2 Choosing a model 11.4.3 The exponential distribution 11.4.4 The Weibull distribution 11.4.5 Other probability models
11.5.1 Comparing two groups
11.6 The Cox proportional hazards model 11.5.2 Comparing several groups 11.5.3 Modelling with a quantitative variable 11.5.4 Modelling with several variables 11.5.5 Leftcensoring
11.6.1 Timedependent covariates
11.7 The Weibull proportional hazards model 11.6.2 Recurrent events 11.8 Model checking
11.8.1 Log cumulative hazard plots
11.9 Competing risk 11.8.2 An objective test of proportional hazards for the Cox model 11.8.3 An objective test of proportional hazards for the Weibull model 11.8.4 Residuals and influence 11.8.5 Nonproportional hazards
11.9.1 Joint modeling of longitudinal and survival data
11.10 Poisson regression
11.10.1 Simple regression
11.11 Pooled logistic regression 11.10.2 Multiple regression 11.10.3 Comparison of standardised event ratios 11.10.4 Routine or registration data 11.10.5 Generic data 11.10.6 Model checking Exercises 12 Metaanalysis
12.1 Reviewing evidence
12.1.1 The Cochrane collaboration
12.2 Systematic review
12.2.1 Designing a systematic review
12.3 A General approach to pooling 12.2.2 Study quality
12.3.1 Inverse variance weighting
12.4 Investigating heterogeneity 12.3.2 Fixed effect and random effects 12.3.3 Quantifying heterogeneity 12.3.4 Estimating the betweenstudy variance 12.3.5 Calculating inverse variance weights 12.3.6 Calculating standard errors from confidence intervals 12.3.7 Case studies 12.3.8 Pooling risk differences 12.3.9 Pooling differences in mean values 12.3.10 Other quantities 12.3.11 Pooling mixed quantities 12.3.12 Doseresponse metaanalysis
12.4.1 Forest plots
12.5 Pooling tabular data 12.4.2 Influence plots 12.4.3 Sensitivity analyses 12.4.4 Metaregression
12.5.1 Inverse variance weighting
12.6 Individual participant data 12.5.2 Mantel–Haenszel methods 12.5.3 The Peto method 12.5.4 Dealing with zeros 12.5.5 Advantages and disadvantages of using tabular data 12.7 Dealing with aspects of study quality 12.8 Publication bias
12.8.1 The funnel plot
12.9 Advantages and limitations of metaanalysis 12.8.2 Consequences of publication bias 12.8.3 Correcting for publication bias 12.8.4 Other causes of asymmetry in funnel plots Exercises 13 Risk scores And clinical decision rules
13.1 Introduction
13.1.1 Individual and population level interventions
13.2 Association and prognosis 13.1.2 Scope of this chapter
13.2.1 The concept of discrimination
13.3 Risk scores from statistical models 13.2.2 Risk factor thresholds 13.2.3 Risk thresholds 13.2.4 Odds ratios and discrimination
13.3.1 Logistic regression
13.4 Quantifying discrimination 13.3.2 Multiple variable risk scores 13.3.3 Cox regression 13.3.4 Risk thresholds 13.3.5 Multiple thresholds
13.4.1 The area under the curve
13.5 Calibration 13.4.2 Comparing AUCs 13.4.3 Survival data 13.4.4 The standardised mean effect size 13.4.5 Other measures of discrimination
13.5.1 Overall calibration
13.6 Recalibration 13.5.2 Mean calibration 13.5.3 Grouped calibration 13.5.4 Calibration plots
13.6.1 Recalibration of the mean
13.7 The accuracy of predictions 13.6.2 Recalibration of scores in a fixed cohort 13.6.3 Recalibration of parameters from a Cox model 13.6.4 Recalibration and discrimination
13.7.1 The Brier score
13.8 Assessing an extraneous prognostic variable 13.7.2 Comparison of Brier scores 13.9 Reclassification
13.9.1 The integrated discrimination improvement from a fixed cohort
13.10 Validation 13.9.2 The net reclassification improvement from a fixed cohort 13.9.3 The integrated discrimination improvement from a variable cohort 13.9.4 The net reclassification improvement from a variable cohort 13.9.5 Software 13.11 Presentation of risk scores
13.11.1 Point scoring
13.12 Impact Studies Exercises 14 Computerintensive methods
14.1 Rationale
14.2 The bootstrap
14.2.1 Bootstrap distributions
14.3 Bootstrap confidence intervals
14.3.1 Bootstrap normal intervals
14.4 Practical issues when bootstrapping 14.3.2 Bootstrap percentile intervals 14.3.3 Bootstrap biascorrected intervals 14.3.4 Bootstrap biascorrected and accelerated intervals 14.3.5 Overview of the worked example 14.3.6 Choice of bootstrap interval
14.4.1 Software
14.5 Further examples of bootstrapping 14.4.2 How many replications should be used? 14.4.3 Sensible strategies
14.5.1 Complex bootstrap samples
14.6 Bootstrap hypothesis testing 14.7 Limitations of bootstrapping 14.8 Permutation tests
14.8.1 Monte Carlo permutation tests
14.9 Missing values 14.8.2 Limitations
14.9.1 Dealing with missing values
14.10 Naive imputation methods 14.9.2 Types of missingness 14.9.3 Complete case analyses
14.10.1 Mean imputation
14.11 Univariate multiple imputation 14.10.2 Conditional mean and regression imputation 14.10.3 Hot deck imputation and predictive mean matching 14.10.4 Longitudinal data
14.11.1 Multiple imputation by regression
14.12 Multivariate multiple imputation 14.11.2 The threestep process in MI 14.11.3 Imputer's and analyst's models 14.11.4 Rubin's equations 14.11.5 Imputation diagnostics 14.11.6 Skewed continuous data 14.11.7 Other types of variables 14.11.8 How many imputations?
14.12.1 Monotone imputation
14.13 When is it worth imputing? 14.12.2 Data augmentation 14.12.3 Categorical variables 14.12.4 What to do when DA fails 14.12.5 Chained equations 14.12.6 Longitudinal data Exercises Appendix A Materials available on the website for this book
Appendix B Statistical tables
Appendix C Additional datasets for exercises
References
Index
