1 Introduction

1.1 Statistics and medicine

1.2 Statistics and mathematics

1.3 Statistics and computing

1.4 Assumptions and approximations

1.5 The scope of this book

2 The design of experiments

2.1 Comparing treatments

2.2 Random allocation

2.3 Stratification

2.4 Methods of allocation without random numbers

2.5 Volunteer bias

2.6 Intention to treat

2.7 Cross-over designs

2.8 Selection of subjects for clinical trials

2.9 Response bias and placebos

2.10 Assessment bias and double blind studies

2.11 Laboratory experiments

2.12 Experimental units and cluster randomized trials

2.13 Consent in clinical trials

2.14 Minimization

2.15 Multiple choice questions: Clinical trials

2.16 Exercise: The ‘Know Your Midwife’ trial

3 Sampling and observational studies

3.1 Observational studies

3.2 Censuses

3.3 Sampling

3.4 Random sampling

3.5 Sampling in clinical and epidemiological studies

3.6 Cross-sectional studies

3.7 Cohort studies

3.8 Case–control studies

3.9 Questionnaire bias in observational studies

3.10 Ecological studies

3.11 Multiple choice questions: Observational studies

3.12 Exercises: *Campylobacter jejuni* infection

4 Summarizing data

4.1 Types of data

4.2 Frequency distributions

4.3 Histograms and other frequency graphs

4.4 Shapes of frequency distribution

4.5 Medians and quantiles

4.6 The mean

4.7 Variance, range and interquartile range

4.8 Standard deviation

4.9 Multiple choice questions: Summarizing data

4.10 Exercise: Student measurements and a graph of study numbers

Appendix 4A: The divisor for the variance

Appendix 4B: Formulae for the sum of squares

5 Presenting data

5.1 Rates and proportions

5.2 Significant figures

5.3 Presenting tables

5.4 Pie charts

5.5 Bar charts

5.6 Scatter diagrams

5.7 Line graphs and time series

5.8 Misleading graphs

5.9 Using different colours

5.10 Logarithmic scales

5.11 Multiple choice questions: Data presentation

5.12 Exercise: Creating presentation graphs

Appendix 5A: Logarithms

6 Probability

6.1 Probability

6.2 Properties of probability

6.3 Probability distributions and random variables

6.4 The Binomial distribution

6.5 Mean and variance

6.6 Properties of means and variances

6.7 The Poisson distribution

6.8 Conditional probability

6.9 Multiple choice questions: Probability

6.10 Exercise: Probability in court

Appendix 6A: Permutations and combinations

Appendix 6B: Expected value of a sum of squares

7 The Normal distribution

7.1 Probability for continuous variables

7.2 The Normal distribution

7.3 Properties of the Normal distribution

7.4 Variables which follow a Normal distribution

7.5 The Normal plot

7.6 Multiple choice questions: The Normal distribution

7.7 Exercise: Distribution of some measurements obtained by students

Appendix 7A: Chi-squared, t, and F

8 Estimation

8.1 Sampling distributions

8.2 Standard error of a sample mean

8.3 Confidence intervals

8.4 Standard error and confidence interval for a proportion

8.5 The difference between two means

8.6 Comparison of two proportions

8.7 Number needed to treat

8.8 Standard error of a sample standard deviation

8.9 Confidence interval for a proportion when numbers are small

8.10 Confidence interval for a median and other quantiles

8.11 Bootstrap or resampling methods

8.12 What is the correct confidence interval?

8.13 Multiple choice questions: Confidence intervals

8.14 Exercise: Confidence intervals in two acupuncture studies

Appendix 8A: Standard error of a mean

9 Significance tests

9.1 Testing a hypothesis

9.2 An example: The sign test

9.3 Principles of significance tests

9.4 Significance levels and types of error

9.5 One and two sided tests of significance

9.6 Significant, real and important

9.7 Comparing the means of large samples

9.8 Comparison of two proportions

9.9 The power of a test

9.10 Multiple significance tests

9.11 Repeated significance tests and sequential analysis

9.12 Significance tests and confidence intervals

9.13 Multiple choice questions: Significance tests

9.14 Exercise: Crohn's disease and cornflakes

10 Comparing the means of small samples

10.1 The t distribution

10.2 The one sample t method

10.3 The means of two independent samples

10.4 The use of transformations

10.5 Deviations from the assumptions of t methods

10.6 What is a large sample?

10.7 Serial data

10.8 Comparing two variances by the F test

10.9 Comparing several means using analysis of variance

10.10 Assumptions of the analysis of variance

10.11 Comparison of means after analysis of variance

10.12 Random effects in analysis of variance

10.13 Units of analysis and cluster randomized trials

10.14 Multiple choice questions: Comparisons of means

10.15 Exercise: Some analyses comparing means

Appendix 10A: The ratio mean/standard error

11 Regression and correlation

11.1 Scatter diagrams

11.2 Regression

11.3 The method of least squares

11.4 The regression of *X* on *Y*

11.5 The standard error of the regression coefficient

11.6 Using the regression line for prediction

11.7 Analysis of residuals

11.8 Deviations from assumptions in regression

11.9 Correlation

11.10 Significance test and confidence interval for *r*

11.11 Uses of the correlation coefficient

11.12 Using repeated observations

11.13 Intraclass correlation

11.14 Multiple choice questions: Regression and correlation

11.15 Exercise: Serum potassium and ambient temperature

Appendix 11A: The least squares estimates

Appendix 11B: Variance about the regression line

Appendix 11C: The standard error of *b*

12 Methods based on rank order

12.1 Non-parametric methods

12.2 The Mann-Whitney U test

12.3 The Wilcoxon matched pairs test

12.4 Spearman's rank correlation coefficient, rho

12.5 Kendall's rank correlation coefficient, tau

12.6 Continuity corrections

12.7 Parametric or non-parametric methods?

12.8 Multiple choice questions: Rank-based methods

12.9 Exercise: Some applications of rank-based methods

13 The analysis of cross-tabulations

13.1 The chi-squared test for association

13.2 Tests for 2 by 2 tables

13.3 The chi-squared test for small samples

13.4 Fisher's exact test

13.5 Yates' continuity correction for the 2 by 2 table

13.6 The validity of Fisher's and Yates' methods

13.7 Odds and odds ratios

13.8 The chi-squared test for trend

13.9 Methods for matched samples

13.10 The chi-squared goodness of fit test

13.11 Multiple choice questions: Categorical data

13.12 Exercise: Some analyses of categorical data

Appendix 13A: Why the chi-squared test works

Appendix 13B: The formula for Fisher's exact test

Appendix 13C: Standard error for the log odds ratio

14 Choosing the statistical method

14.1 Method oriented and problem oriented teaching

14.2 Types of data

14.3 Comparing two groups

14.4 One sample and paired samples

14.5 Relationship between two variables

14.6 Multiple choice questions: Choice of statistical method

14.7 Exercise: Choosing a statistical method

15 Multifactorial methods

15.1 Multiple regression

15.2 Significance tests and estimation in multiple regression

15.3 Using multiple regression for adjustment

15.4 Transformations in multiple regression

15.5 Interaction in multiple regression

15.6 Polynomial regression

15.7 Assumptions of multiple regression

15.8 Qualitative predictor variables

15.9 Multi-way analysis of variance

15.10 Logistic regression

15.11 Stepwise regression

15.12 Seasonal effects

15.13 Dealing with counts: Poisson regression and negative binomial regression

15.14 Other regression methods

15.15 Data where observations are not independent

15.16 Multiple choice questions: Multifactorial methods

15.17 Exercise: A multiple regression analysis

16 Time to event data

16.1 Time to event data

16.2 Kaplan-Meier survival curves

16.3 The logrank test

16.4 The hazard ratio

16.5 Cox regression

16.6 Multiple choice questions: Time to event data

16.7 Exercise: Survival after retirement

17 Meta-analysis

17.1 What is a meta-analysis?

17.2 The forest plot

17.3 Getting a pooled estimate

17.4 Heterogeneity

17.5 Measuring heterogeneity

17.6 Investigating sources of heterogeneity

17.7 Random effects models

17.8 Continuous outcome variables

17.9 Dichotomous outcome variables

17.10 Time to event outcome variables

17.11 Individual participant data meta-analysis

17.12 Publication bias

17.13 Network meta-analysis

17.14 Multiple choice questions: Meta-analysis

17.15 Exercise: Dietary sugars and body weight

18 Determination of sample size

18.1 Estimation of a population mean

18.2 Estimation of a population proportion

18.3 Sample size for significance tests

18.4 Comparison of two means

18.5 Comparison of two proportions

18.6 Detecting a correlation

18.7 Accuracy of the estimated sample size

18.8 Trials randomized in clusters

18.9 Multiple choice questions: Sample size

18.10 Exercise: Estimation of sample sizes

19 Missing data

19.1 The problem of missing data

19.2 Types of missing data

19.3 Using the sample mean

19.4 Last observation carried forward

19.5 Simple imputation

19.6 Multiple imputation

19.7 Why we should not ignore missing data

19.8 Multiple choice questions: Missing data

19.9 Exercise: Last observation carried forward

20 Clinical measurement

20.1 Making measurements

20.2 Repeatability and measurement error

20.3 Assessing agreement using Cohen's kappa

20.4 Weighted kappa

20.5 Comparing two methods of measurement

20.6 Sensitivity and specificity

20.7 Normal range or reference interval

20.8 Centile charts

20.9 Combining variables using principal components analysis

20.10 Composite scales and subscales

20.11 Internal consistency of scales and Cronbach's alpha

20.12 Presenting composite scales

20.13 Multiple choice questions: Measurement

20.14 Exercise: Two measurement studies

21 Mortality statistics and population structure

21.1 Mortality rates

21.2 Age standardization using the direct method

21.3 Age standardization by the indirect method

21.4 Demographic life tables

21.5 Vital statistics

21.6 The population pyramid

21.7 Multiple choice questions: Population and mortality

21.8 Exercise: Mortality and type 1 diabetes

22 The Bayesian approach

22.1 Bayesians and Frequentists

22.2 Bayes' theorem

22.3 An example: the Bayesian approach to computer-aided diagnosis

22.4 The Bayesian and frequency views of probability

22.5 An example of Bayesian estimation

22.6 Prior distributions

22.7 Maximum likelihood

22.8 Markov Chain Monte Carlo methods

22.9 Bayesian or Frequentist?

22.10 Multiple choice questions: Bayesian methods

22.11 Exercise: A Bayesian network meta-analysis

Appendix 1: Suggested answers to multiple choice questions and exercises

References

Index