Epidemiology: Study Design and Data Analysis, Second Edition
Author: |
Mark Woodward |
| Publisher: |
Chapman & Hall/CRC |
| Copyright: |
2005 |
| ISBN-13: |
978-1-584-88415-6 |
| Pages: |
849; hardcover |
| Price: |
$84.50 |
|
|
|
|
Comment from the Stata technical group
Woodward’s second edition of Epidemiology: Study Design and Data
Analysis has two target audiences: researchers who need statistical
solutions to epidemiology problems and statisticians who wish to learn how
their science applies to epidemiology. Fulfilling these goals requires
careful navigation through a narrow course that is neither too theoretical
nor too overcome by medical jargon. This book not only successfully navigates
this course but does so while providing a complete treatment of the topic,
all the way from simple contingency tables to meta-analysis. The book uses
real data throughout—more than 20 large datasets are cataloged for
download—and the end of each chapter has exercises (with solutions).
Woodward makes Stata code for working many of the examples available for
download.
Topics covered include basic terminology, causality, descriptive statistics,
testing of means, relative risks versus odds ratios, exact tests based on
tables, tests for linear and nonlinear trends, confounding and interaction
including direct and indirect standardization, cohort designs,
case–control studies, intervention studies, power and sample size,
linear models including analysis of variance, logistic and other models for
binary responses, survival analysis including Cox regression, and
meta-analysis.
Table of contents
1. Fundamental issues
1.1 What is epidemiology?
1.2 Case studies: the work of Doll and Hill
1.3 Populations and samples
1.3.1 Populations
1.3.2 Samples
1.4 Measuring disease
1.4.1 Incidence and prevalence
1.5 Measuring the risk factor
1.6 Causality
1.6.1 Association
1.6.2 Problems with establishing causality
1.6.3 Principals of causality
1.7 Studies using routine data
1.7.1 Ecological data
1.7.2 Sources of data on disease
1.7.3 Sources of data on risk factors
1.8 Study design
1.8.1 Intervention studies
1.8.2 Observational studies
1.9 Data analysis
Exercises
2. Basic analytical procedures
2.1 Introduction
2.1.1 Inferential procedures
2.2 Case study
2.2.1 The Scottish Heart Health Study
2.3 Types of variables
2.3.1 Qualitative variables
2.3.2 Quantitative variables
2.3.3 The hierarchy of type
2.4 Tables and charts
2.4.1 Tables in reports
2.4.2 Diagrams in reports
2.5 Inferential techniques for categorical variables
2.5.1 Contingency tables
2.5.2 Binary variables: proportions and percentages
2.5.3 Comparing two proportions or percentages
2.6 Descriptive techniques for quantitative variables
2.6.1 The five-number summary
2.6.2 Quantiles
2.6.3 The two-number summary
2.6.4 Other summary statistics of spread
2.6.5 Assessing symmetry
2.6.6 Investigating shape
2.7 Inferences about means
2.7.1 Checking normality
2.7.2 Inferences for a single mean
2.7.3 Comparing two means
2.7.4 Paired data
2.8 Inferential techniques for non-normal data
2.8.1 Transformations
2.8.2 Nonparametric tests
2.8.3 Confidence intervals for medians
2.9 Measuring agreement
2.9.1 Quantitative variables
2.9.2 Categorical variables
2.9.3 Ordered categorical variables
2.10 Assessing diagnostic tests
2.10.1 Accounting for sensitivity and specificity
Exercises
3. Assessing risk factors
3.1 Risk and relative risk
3.2 Odds and odds ratios
3.3 Relative risk or odds ratio?
3.4 Prevalence studies
3.5 Testing association
3.5.1 Equivalent tests
3.5.2 One-sided tests
3.5.3 Continuity corrections
3.5.4 Fisher's exact test
3.5.5 Limitations of tests
3.6 Risk factors at several levels
3.6.1 Continuous risk factors
3.6.2 A test for linear trend
3.6.3 A test for nonlinearity
3.7 Attributable risk
3.8 Rate and relative rate
3.8.1 The general epidemiological rate
3.9 Measures of difference
Exercises
4. Confounding and interaction
4.1 Introduction
4.2 The concept of confounding
4.3 Identification of confounders
4.3.1 A strategy for selection
4.4 Assessing confounding
4.4.1 Using estimation
4.4.2 Using hypothesis tests
4.4.3 Dealing with several confounding variables
4.5 Stadardisation
4.5.1 Direct standardisation of event rates
4.5.2 Indirect standardisation of event rates
4.5.3 Standardisation of risks
4.6 Mantel–Haenszel methods
4.6.1 The Mantel–Haenszel relative risk
4.6.2 The Cochran–Mantel–Haenszel test
4.6.3 Further comments
4.7 The concept of interaction
4.8 Testing for interaction
4.8.1 Using relative risk
4.8.2 Using the odds ratio
4.8.3 Using the risk difference
4.8.4 Which type of interaction to use?
4.8.5 Which interactions to test?
4.9 Dealing with interaction
Exercises
5. Cohort studies
5.1 Design considerations
5.1.1 Advantages
5.1.2 Disadvantages
5.1.3 Alternative designs with economic advantages
5.1.4 Studies with a single baseline examples
5.2 Analytical considerations
5.2.1 Concurrent follow-up
5.2.2 Moving baseline dates
5.2.3 Varying follow-up durations
5.2.4 Withdrawals
5.2.5 Competing causes of failure
5.3 Cohort life tables
5.3.1 Allowing for sampling variation
5.3.2 Allowing for censoring
5.3.3 Comparison of two life tables
5.3.4 Limitations
5.4 Kaplan–Meier estimation
5.4.1 An empirical comparison
5.5 Comparison of two sets of survival probabilities
5.5.1 Mantel–Haenszel methods
5.5.2 The log-rank test
5.5.3 Weighted log-rank tests
5.5.4 Allowing for confounding variables
5.5.5 Comparing three or more groups
5.6 The person-years method
5.6.1 Age-specific rates
5.6.2 Summarization of rates
5.6.3 Comparison of two SERs
5.6.4 Mantel–Haenszel methods
5.6.5 Further comments
5.7 Period-cohort analysis
5.7.1 Period-specific rates
Exercises
6. Case–control studies
6.1 Basic design concepts
6.1.1 Advantages
6.1.2 Disadvantages
6.2 Basic methods of analysis
6.2.1 Dichotomous exposure
6.2.2 Polytomous exposure
6.2.3 Confounding and interaction
6.2.4 Attributable risk
6.3 Selection of cases
6.3.1 Definition
6.3.2 Inclusion and exclusion criteria
6.3.3 Incident or prevalent?
6.3.4 Source
6.4.5 Consideration of bias
6.4 Selection of controls
6.4.1 General principles
6.4.2 Hospital controls
6.4.3 Community controls
6.4.5 Other sources
6.4.6 How many?
6.5 Matching
6.5.1 Advantages
6.5.2 Disadvantages
6.5.3 One-to-many matching
6.5.4 Matching in other study designs
6.6 The analysis of matched studies
6.6.1 1 : 1 Matching
6.6.2 1 : c Matching
6.6.3 1 : Variable matching
6.6.4 Many : many matching
6.6.5 A modelling approach
6.7 Nested case–control studies
6.7.1 Matched studies
6.7.2 Counter-matched studies
6.8 Case–cohort studies
6.9 Case–crossover studies
Exercises
7. Intervention studies
7.1 Introduction
7.1.1 Advantages
7.1.2 Disadvantages
7.2 Ethical considerations
7.2.1 The protocol
7.3 Avoidance of bias
7.3.1 Use of a control group
7.3.2 Blindness
7.3.3 Randomization
7.3.4 Consent before randomization
7.3.5 Analysis by intention-to-treat
7.4 Parallel group studies
7.4.1 Number needed to treat
7.4.2 Cluster randomized trials
7.5 Cross-over studies
7.5.1 Graphical analysis
7.5.2 Comparing means
7.5.3 Analysing preferences
7.5.4 Analysing binary data
7.6 Sequential studies
7.7 Allocation to treatment group
7.7.1 Global randomization
7.7.2 Stratified randomization
7.7.3 Implementation
Exercises
8. Sample size determination
8.1 Introduction
8.2 Power
8.2.1 Choice of alternative hypothesis
8.3 Testing a mean value
8.3.1 Common choices for power and significance level
8.3.2 Using a table of sample sizes
8.3.3 The minimum detectable difference
8.3.4 The assumption of known standard deviation
8.4 Testing a difference between means
8.4.1 Using a table of sample sizes
8.4.2 Power and minimum detectable difference
8.4.3 Optimum distribution of the sample
8.4.4 Paired data
8.5 Testing a proportion
8.5.1 Using a table of sample sizes
8.5.2 Power and minimum detectable difference
8.6 Testing a relative risk
8.6.1 Using a table of sample sizes
8.6.2 Power and minimum detectable relative risk
8.7 Case–control studies
8.7.1 Using a table of samples sizes
8.7.2 Power and minimum detectable relative risk
8.7.3 Comparison with cohort studies
8.7.4 Matched studies
8.8 Complex sampling designs
8.9 Concluding remarks
Exercises
9. Modelling quantitative outcome variables
9.1 Statistical models
9.2 One categorical explanatory variable
9.2.1 The hypotheses to be tested
9.2.2 Construction of the ANOVA table
9.2.3 How the ANOVA table is used
9.2.4 Estimation of group means
9.2.5 Comparison of group means
9.2.6 Fitted values
9.2.7 Using computer packages
9.3 One quantitative explanatory variable
9.3.1 Simple linear regression
9.3.2 Correlation
9.3.3 Nonlinear regression
9.4 Two categorical explanatory variables
9.4.1 Model specification
9.4.2 Model fitting
9.4.3 Balanced data
9.4.4 Unbalanced data
9.4.5 Fitted values
9.4.6 Least squares means
9.4.7 Interaction
9.5 Model building
9.6 General linear models
9.7 Several explanatory variables
9.8 Model checking
9.9 Confounding
9.9.1 Adjustment using residuals
9.10 Longitudinal data
9.11 Non-normal alternatives
Exercises
10. Modelling binary outcome data
10.1 Introduction
10.2 Problems with standard regression models
10.2.1 The r-x relationship may well not be linear
10.2.2 Predicted values of the risk may be outside the valid range
10.2.3 The error distribution is not normal
10.3 Logistic regression
10.4 Interpretation of logistic regression coefficients
10.4.1 Binary risk factors
10.4.2 Quantitative risk factors
10.4.3 Categorical risk factors
10.4.4 Ordinal risk factors
10.4.5 Floating absolute risks
10.5 Generic data
10.6 Multiple logistic regression models
10.7 Tests of hypotheses
10.7.1 Goodness of fit for grouped data
10.7.2 Goodness of fit for generic data
10.7.3 Effect of a risk factor
10.7.4 Tests for linearity and nonlinearity
10.7.5 Tests based upon estimates and their standard errors
10.7.6 Problems with missing values
10.8 Confounding
10.9 Interaction
10.9.1 Between two categorical variables
10.9.2 Between a quantitative and a categorical variable
10.9.3 Between two quantitative variables
10.10 Model checking
10.10.1 Residuals
10.10.2 Influential observations
10.11 Regression dilution
10.11.1 Correcting for regression dilution
10.12 Case–control studies
10.12.1 Unmatched studies
10.12.2 Matched studies
10.13 Outcomes with several ordered levels
10.13.1 The proportional odds assumption
10.13.2 The proportional odds model
10.14 Longitudinal data
10.15 Complex sampling designs
Exercises
11. Modelling follow-up data
11.1 Introduction
11.1.1 Models for survival data
11.2 Basic functions of survival time
11.2.1 The survival function
11.2.2 The hazard function
11.3 Estimating the hazard function
11.3.1 Kaplan–Meier estimation
11.3.2 Person-time estimation
11.3.3 Actuarial estimation
11.4 Probability models
11.4.1 The probability density and cumulative distribution functions
11.4.2 Choosing a model
11.4.3 The exponential distribution
11.4.4 The Weibull distribution
11.4.5 Other probability models
11.5 Proportional hazards regression models
11.5.1 Comparing two groups
11.5.2 Comparing several groups
11.5.3 Modelling with a quantitative variable
11.5.4 Modelling with several variables
11.6 The Cox proportional hazards model
11.6.1 Time-dependent covariates
11.6.2 Recurrent events
11.7 The Weibull proportional hazards model
11.8 Model checking
11.8.1 Log cumulative hazard plots
11.8.2 An objective test of proportional hazards for the Cox model
11.8.3 An objective test of proportional hazards for the Weibull model
11.8.4 Residuals and influence
11.8.5 Nonproportional hazards
11.9 Poisson regression
11.9.1 Simple regression
11.9.2 Multiple regression
11.9.3 Comparison of standardised event ratios
11.9.4 Routine or registration data
11.9.5 Generic data
11.9.6 Model checking
11.10 Pooled logistic regression
Exercises
12. Meta-analysis
12.1 Reviewing evidence
12.1.1 The Cochrane Collaboration
12.2 Systematic review
12.2.1 Designing a systematic review
12.2.2 Study quality
12.3 A general approach to pooling
12.3.1 Inverse variance weighting
12.3.2 Fixed effect and random effects
12.3.3 Quantifying heterogeneity
12.3.4 Estimating the between-study variance
12.3.5 Calculating inverse variance weights
12.3.6 Calculating standard errors from confidence intervals
12.3.7 Dealing with the normal approximation
12.3.8 Pooling risk differences
12.3.9 Pooling differences in mean values
12.3.10 Other quantities
12.3.11 Pooling mixed quantities
12.4 Investigating heterogeneity
12.4.1 Forest plots
12.4.2 Influence plots
12.4.3 Sensitivity analyses
12.4.4 Meta-regression
12.5 Pooling tabular data
12.5.1 Inverse variance weighting
12.5.2 Mantel–Haenszel methods
12.5.3 The Peto method
12.5.4 Dealing with zeros
12.5.5 Advantages and disadvantages of using tabular data
12.6 Individual participant data
12.7 Dealing with aspects of study quality
12.8 Publication bias
12.8.1 The funnel plot
12.8.2 Consequences of publication bias
12.8.3 Correcting for publication bias
12.8.4 Other causes of asymmetry in funnel plots
12.9 Is meta-analysis a valid tool in epidemiology?
Exercises
Appendix A. Materials available on the website for this book
A.1 SAS programs
A.2 Stata programs
A.3 Sample size spreadsheet
A.4 Floating absolute risk macros
A.5 Data sets from the text
Appendix B. Statistical tables
Appendix C. Example data sets
Solutions to the exercises
References
Index
|