Search
   >> Home >> Bookstore >> Biostatistics and epidemiology >> Epidemiology: Study Design and Data Analysis, Third Edition

Epidemiology: Study Design and Data Analysis, Third Edition

Author:
Mark Woodward
Publisher: Chapman & Hall/CRC
Copyright: 2013
ISBN-13: 978-1-439-83970-6
Pages: 898; hardcover
Price: $84.75

Comment from the Stata technical group

Woodward’s third edition of Epidemiology: Study Design and Data Analysis has two target audiences: researchers who need statistical solutions to epidemiology problems and statisticians who wish to learn how their science applies to epidemiology. This book successfully presents statistical principles in epidemiology in a manner that is neither too theoretical nor too replete with medical jargon. It provides complete treatment of the topic, from simple contingency tables to meta-analysis. The book uses real data throughout—more than 20 large datasets are cataloged for download—and the end of each chapter has exercises. Woodward makes Stata code for working many of the examples available for download.

Topics include basic terminology, causality, descriptive statistics, testing of means, relative risks versus odds ratios, exact tests based on tables, tests for linear and nonlinear trends, confounding and interaction, direct and indirect standardization, cohort designs, case–control studies, intervention studies, power and sample size, linear models (including analysis of variance), logistic and other models for binary responses, survival analysis (including Cox regression), and meta-analysis. The third edition has been expanded to include risk scores and clinical decision rules, bootstrapping, multiple imputation, binomial regression models, competing risk, propensity scoring, and splines.


Table of contents

1 Fundamental issues
1.1 What is epidemiology?
1.2 Case studies: The work of Doll and Hill
1.3 Populations and samples
1.3.1 Populations
1.3.2 Samples
1.4 Measuring disease
1.4.1 Incidence and prevalence
1.5 Measuring the risk factor
1.6 Causality
1.6.1 Association
1.6.2 Problems with establishing causality
1.6.3 Principles of causality
1.7 Studies using routine data
1.7.1 Ecological data
1.7.2 National sources of data on disease
1.7.3 National sources of data on risk factors
1.7.4 International data
1.8 Study design
1.8.1 Intervention studies
1.8.2 Observational studies
1.9 Data analysis
Exercises
2 Basic analytical procedures
2.1 Introduction
2.1.1 Inferential procedures
2.2 Case study
2.2.1 The Scottish Heart Health Study
2.3 Types of variables
2.3.1 Qualitative variables
2.3.2 Quantitative variables
2.3.3 The hierarchy of type
2.4 Tables and charts
2.4.1 Tables in reports
2.4.2 Diagrams in reports
2.5 Inferential techniques for categorical variables
2.5.1 Contingency tables
2.5.2 Binary variables: proportions and percentages
2.5.3 Comparing two proportions or percentages
2.6 Descriptive Techniques for quantitative variables
2.6.1 The five-number summary
2.6.2 Quantiles
2.6.3 The two-number summary
2.6.4 Other summary statistics of spread
2.6.5 Assessing symmetry
2.6.6 Investigating shape
2.7 Inferences about means
2.7.1 Checking normality
2.7.2 Inferences for a single mean
2.7.3 Comparing two means
2.7.4 Paired data
2.8 Inferential techniques for non-normal data
2.8.1 Transformations
2.8.2 Nonparametric tests
2.8.3 Confidence intervals for medians
2.9 Measuring agreement
2.9.1 Quantitative variables
2.9.2 Categorical variables
2.9.3 Ordered categorical variables
2.9.4 Internal consistency
2.10 Assessing diagnostic tests
2.10.1 Accounting for sensitivity and specificity
Exercises
3 Assessing risk factors
3.1 Risk and relative risk
3.2 Odds and odds ratio
3.3 Relative risk or odds ratio?
3.4 Prevalence studies
3.5 Testing association
3.5.1 Equivalent tests
3.5.2 One-sided tests
3.5.3 Continuity corrections
3.5.4 Fisher's exact test
3.5.5 Limitations of tests
3.6 Risk factors measured at several levels
3.6.1 Continuous risk factors
3.6.2 A test for linear trend
3.6.3 A test for nonlinearity
3.7 Attributable risk
3.8 Rate and relative rate
3.8.1 The general epidemiological rate
3.9 Measures of difference
3.10 EPITAB commands in Stata
Exercises
4 Confounding and interaction
4.1 Introduction
4.2 The concept of confounding
4.3 Identification of confounders
4.3.1 A strategy for selection
4.4 Assessing confounding
4.4.1 Using estimation
4.4.2 Using hypothesis tests
4.4.3 Dealing with several confounding variables
4.5 Standardisation
4.5.1 Direct standardisation of event rates
4.5.2 Indirect standardisation of event rates
4.5.3 Standardisation of risks
4.6 Mantel–Haenszel methods
4.6.1 The Mantel–Haenszel relative risk
4.6.2 The Cochran–Mantel–Haenszel test
4.6.3 Further comments
4.7 The concept of interaction
4.8 Testing for interaction
4.8.1 Using the relative risk
4.8.2 Using the odds ratio
4.8.3 Using the risk difference
4.8.4 Which type of interaction to use?
4.8.5 Which interactions to test?
4.9 Dealing with interaction
4.10 EPITAB commands in Stata
Exercises
5 Cohort studies
5.1 Design considerations
5.1.1 Advantages
5.1.2 Disadvantages
5.1.3 Alternative designs with economic advantages
5.1.4 Studies with a single baseline sample
5.2 Analytical considerations
5.2.1 Concurrent follow-up
5.2.2 Moving baseline dates
5.2.3 Varying follow-up durations
5.2.4 Withdrawals
5.3 Cohort life tables
5.3.1 Allowing for sampling variation
5.3.2 Allowing for censoring
5.3.3 Comparison of two life tables
5.3.4 Limitations
5.4 Kaplan-Meier estimation
5.4.1 An empirical comparison
5.5 Comparison of two sets of survival probabilities
5.5.1 Mantel–Haenszel methods
5.5.2 The log-rank test
5.5.3 Weighted log-rank tests
5.5.4 Allowing for confounding variables
5.5.5 Comparing three of more groups
5.6 Competing risk
5.7 The person-years method
5.7.1 Age-specific rates
5.7.2 Summarisation of rates
5.7.3 Comparison of two SERs
5.7.4 Mantel–Haenszel methods
5.7.5 Further comments
5.8 Period-cohort analysis
5.8.1 Period-specific rates
Exercises
6 Case–control studies
6.1 Basic design concepts
6.1.1 Advantages
6.1.2 Disadvantages
6.2 Basic methods of analysis
6.2.1 Dichotomous exposure
6.2.2 Polytomous exposure
6.2.3 Confounding and interaction
6.2.4 Attributable risk
6.3 Selection of cases
6.3.1 Definition
6.3.2 Inclusion and exclusion criteria
6.3.3 Incident or prevalent?
6.3.4 Source
6.3.5 Consideration of bias
6.4 Selection of controls
6.4.1 General principles
6.4.2 Hospital controls
6.4.3 Community controls
6.4.4 Other sources
6.4.5 How many?
6.5 Matching
6.5.1 Advantages
6.5.2 Disadvantages
6.5.3 One-to-many matching
6.5.4 Matching in other study designs
6.6 The analysis of matched studies
6.6.1 1 : 1 Matching
6.6.2 1 : c Matching
6.6.3 1 : Variable matching
6.6.4 Many : many matching
6.6.5 A modelling approach
6.7 Nested case–control studies
6.7.1 Matched studies
6.7.2 Counter-matched studies
6.8 Case-cohort studies
6.9 Case-crossover studies
Exercises
7 Intervention studies
7.1 Introduction
7.1.1 Advantages
7.1.2 Disadvantages
7.2 Ethical considerations
7.2.1 The protocol
7.3 Avoidance of bias
7.3.1 Use of a control group
7.3.2 Blindness
7.3.3 Randomisation
7.3.4 Consent before randomisation
7.3.5 Analysis by intention-to-treat
7.4 Parallel group studies
7.4.1 Number needed to treat
7.4.2 Cluster randomised trials
7.4.3 Stepped wedge trials
7.4.4 Non-inferiority trials
7.5 Cross-over studies
7.5.1 Graphical analysis
7.5.2 Comparing means
7.5.3 Analysing preferences
7.5.4 Analysing binary data
7.6 Sequential studies
7.6.1 The Haybittle-Peto stopping rule
7.6.2 Adaptive designs
7.7 Allocation to treatment group
7.7.1 Global randomisation
7.7.2 Stratified randomization
7.7.3 Implementation
7.8 Trials as cohorts
Exercises
8 Sample size determination
8.1 Introduction
8.2 Power
8.2.1 Choice of alternative hypothesis
8.3 Testing a mean value
8.3.1 Common choices for power and significance level
8.3.2 Using a table of sample sizes
8.3.3 The minimum detectable difference
8.3.4 The assumption of known standard deviation
8.4 Testing a difference between means
8.4.1 Using a table of sample sizes
8.4.2 Power and minimum detectable difference
8.4.3 Optimum distribution of the sample
8.4.4 Paired data
8.5 Testing a proportion
8.5.1 Using a table of sample sizes
8.6 Testing a relative risk
8.6.1 Using a table of sample sizes
8.6.2 Power and minimum detectable relative risk
8.7 Case–control studies
8.7.1 Using a table of sample sizes
8.7.2 Power and minimum detectable relative risk
8.7.3 Comparison with cohort studies
8.7.4 Matched studies
8.8 Complex sampling designs
8.9 Concluding remarks
Exercises
9 Modelling quantitative outcome variables
9.1 Statistical models
9.2 One categorical explanatory variable
9.2.1 The hypotheses to be tested
9.2.2 Construction of the ANOVA table
9.2.3 How the ANOVA table is used
9.2.4 Estimation of group means
9.2.5 Comparison of group means
9.2.6 Fitted values
9.2.7 Using computer packages
9.3 One quantitative explanatory variable
9.3.1 Simple linear regression
9.3.2 Correlation
9.3.3 Nonlinear regression
9.4 Two categorical explanatory variables
9.4.1 Model specification
9.4.2 Model fitting
9.4.3 Balanced data
9.4.4 Unbalanced data
9.4.5 Fitted values
9.4.6 Least squares means
9.4.7 Interaction
9.5 Model building
9.6 General linear models
9.7 Several explanatory variables
9.7.1 Information criteria
9.7.2 Boosted regression
9.8 Model checking
9.9 Confounding
9.9.1 Adjustment using residuals
9.10 Splines
9.10.1 Choice of knots
9.10.2 Other types of splines
9.11 Panel data
9.12 Non-normal alternatives
Exercises
10 Modelling binary outcome data
10.1 Introduction
10.2 Problems with standard regression models
10.2.1 The r-x relationship may well not be linear
10.2.2 Predicted values of the risk may be outside the valid range
10.2.3 The error distribution is not normal
10.3 Logistic regression
10.4 Interpretation of logistic regression coefficients
10.4.1 Binary risk factors
10.4.2 Quantitative risk factors
10.4.3 Categorical risk factors
10.4.4 Ordinal risk factors
10.4.5 Floating absolute risks
10.5 Generic data
10.6 Multiple logistic regression models
10.7 Tests of hypotheses
10.7.1 Goodness of fit for grouped data
10.7.2 Goodness of fit for generic data
10.7.3 Effect of a risk factor
10.7.4 Information criteria
10.7.5 Tests for linearity and nonlinearity
10.7.6 Tests based upon estimates and their standard errors
10.7.7 Problems with missing values
10.8 Confounding
10.9 Interaction
10.9.1 Between two categorical variables
10.9.2 Between a quantitative and categorical variable
10.9.3 Between two quantitative variables
10.10 Dealing with a quantitative explanatory variable
10.10.1 Linear form
10.10.2 Categorical form
10.10.3 Linear spline form
10.10.4 Generalisations
10.11 Model checking
10.11.1 Residuals
10.11.2 Influential observations
10.12 Measurement error
10.12.1 Regression to the mean
10.12.2 Correcting for regression dilution
10.13 Case–control studies
10.13.1 Unmatched studies
10.13.2 Matched studies
10.14 Outcomes with several levels
10.14.1 The proportional odds assumption
10.14.2 The proportional odds model
10.14.3 Multinomial regression
10.15 Longitudinal data
10.16 Binomial regression
10.16.1 Adjusted risks
10.16.2 Risk differences
10.16.3 Problems with binomial models
10.17 Propensity scoring
10.17.1 Pair-matched propensity scores
10.17.2 Stratified propensity scores
10.17.3 Weighting by the inverse propensity score
10.17.4 Adjusting for the propensity score
10.17.5 Deriving the propensity score
10.17.6 Propensity score outliers
10.17.7 Conduct of the matched design
10.17.8 Analysis of the matched design
10.17.9 Case studies
10.17.10 Interpretation of effects
10.17.11 Problems with estimating uncertainty
10.17.12 Propensity scores in practice
Exercises
11 Modelling follow-up data
11.1 Introduction
11.1.1 Models for survival data
11.2 Basic functions of survival time
11.2.1 The survival function
11.2.2 The hazard function
11.3 Estimating the hazard function
11.3.1 Kaplan–Meier estimation
11.3.2 Person-time estimation
11.3.3 Actuarial estimation
11.3.4 The cumulative hazard
11.4 Probability models
11.4.1 The probability density and cumulative distribution functions
11.4.2 Choosing a model
11.4.3 The exponential distribution
11.4.4 The Weibull distribution
11.4.5 Other probability models
11.5 Proportional hazards regression models
11.5.1 Comparing two groups
11.5.2 Comparing several groups
11.5.3 Modelling with a quantitative variable
11.5.4 Modelling with several variables
11.5.5 Left-censoring
11.6 The Cox proportional hazards model
11.6.1 Time-dependent covariates
11.6.2 Recurrent events
11.7 The Weibull proportional hazards model
11.8 Model checking
11.8.1 Log cumulative hazard plots
11.8.2 An objective test of proportional hazards for the Cox model
11.8.3 An objective test of proportional hazards for the Weibull model
11.8.4 Residuals and influence
11.8.5 Nonproportional hazards
11.9 Competing risk
11.9.1 Joint modeling of longitudinal and survival data
11.10 Poisson regression
11.10.1 Simple regression
11.10.2 Multiple regression
11.10.3 Comparison of standardised event ratios
11.10.4 Routine or registration data
11.10.5 Generic data
11.10.6 Model checking
11.11 Pooled logistic regression
Exercises
12 Meta-analysis
12.1 Reviewing evidence
12.1.1 The Cochrane collaboration
12.2 Systematic review
12.2.1 Designing a systematic review
12.2.2 Study quality
12.3 A General approach to pooling
12.3.1 Inverse variance weighting
12.3.2 Fixed effect and random effects
12.3.3 Quantifying heterogeneity
12.3.4 Estimating the between-study variance
12.3.5 Calculating inverse variance weights
12.3.6 Calculating standard errors from confidence intervals
12.3.7 Case studies
12.3.8 Pooling risk differences
12.3.9 Pooling differences in mean values
12.3.10 Other quantities
12.3.11 Pooling mixed quantities
12.3.12 Dose-response meta-analysis
12.4 Investigating heterogeneity
12.4.1 Forest plots
12.4.2 Influence plots
12.4.3 Sensitivity analyses
12.4.4 Meta-regression
12.5 Pooling tabular data
12.5.1 Inverse variance weighting
12.5.2 Mantel–Haenszel methods
12.5.3 The Peto method
12.5.4 Dealing with zeros
12.5.5 Advantages and disadvantages of using tabular data
12.6 Individual participant data
12.7 Dealing with aspects of study quality
12.8 Publication bias
12.8.1 The funnel plot
12.8.2 Consequences of publication bias
12.8.3 Correcting for publication bias
12.8.4 Other causes of asymmetry in funnel plots
12.9 Advantages and limitations of meta-analysis
Exercises
13 Risk scores And clinical decision rules
13.1 Introduction
13.1.1 Individual and population level interventions
13.1.2 Scope of this chapter
13.2 Association and prognosis
13.2.1 The concept of discrimination
13.2.2 Risk factor thresholds
13.2.3 Risk thresholds
13.2.4 Odds ratios and discrimination
13.3 Risk scores from statistical models
13.3.1 Logistic regression
13.3.2 Multiple variable risk scores
13.3.3 Cox regression
13.3.4 Risk thresholds
13.3.5 Multiple thresholds
13.4 Quantifying discrimination
13.4.1 The area under the curve
13.4.2 Comparing AUCs
13.4.3 Survival data
13.4.4 The standardised mean effect size
13.4.5 Other measures of discrimination
13.5 Calibration
13.5.1 Overall calibration
13.5.2 Mean calibration
13.5.3 Grouped calibration
13.5.4 Calibration plots
13.6 Recalibration
13.6.1 Recalibration of the mean
13.6.2 Recalibration of scores in a fixed cohort
13.6.3 Recalibration of parameters from a Cox model
13.6.4 Recalibration and discrimination
13.7 The accuracy of predictions
13.7.1 The Brier score
13.7.2 Comparison of Brier scores
13.8 Assessing an extraneous prognostic variable
13.9 Reclassification
13.9.1 The integrated discrimination improvement from a fixed cohort
13.9.2 The net reclassification improvement from a fixed cohort
13.9.3 The integrated discrimination improvement from a variable cohort
13.9.4 The net reclassification improvement from a variable cohort
13.9.5 Software
13.10 Validation
13.11 Presentation of risk scores
13.11.1 Point scoring
13.12 Impact Studies
Exercises
14 Computer-intensive methods
14.1 Rationale
14.2 The bootstrap
14.2.1 Bootstrap distributions
14.3 Bootstrap confidence intervals
14.3.1 Bootstrap normal intervals
14.3.2 Bootstrap percentile intervals
14.3.3 Bootstrap bias-corrected intervals
14.3.4 Bootstrap bias-corrected and accelerated intervals
14.3.5 Overview of the worked example
14.3.6 Choice of bootstrap interval
14.4 Practical issues when bootstrapping
14.4.1 Software
14.4.2 How many replications should be used?
14.4.3 Sensible strategies
14.5 Further examples of bootstrapping
14.5.1 Complex bootstrap samples
14.6 Bootstrap hypothesis testing
14.7 Limitations of bootstrapping
14.8 Permutation tests
14.8.1 Monte Carlo permutation tests
14.8.2 Limitations
14.9 Missing values
14.9.1 Dealing with missing values
14.9.2 Types of missingness
14.9.3 Complete case analyses
14.10 Naive imputation methods
14.10.1 Mean imputation
14.10.2 Conditional mean and regression imputation
14.10.3 Hot deck imputation and predictive mean matching
14.10.4 Longitudinal data
14.11 Univariate multiple imputation
14.11.1 Multiple imputation by regression
14.11.2 The three-step process in MI
14.11.3 Imputer's and analyst's models
14.11.4 Rubin's equations
14.11.5 Imputation diagnostics
14.11.6 Skewed continuous data
14.11.7 Other types of variables
14.11.8 How many imputations?
14.12 Multivariate multiple imputation
14.12.1 Monotone imputation
14.12.2 Data augmentation
14.12.3 Categorical variables
14.12.4 What to do when DA fails
14.12.5 Chained equations
14.12.6 Longitudinal data
14.13 When is it worth imputing?
Exercises
Appendix A Materials available on the website for this book
Appendix B Statistical tables
Appendix C Additional datasets for exercises
References
Index
The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube