List of figures

List of tables

List of boxes

Preface

Introduction

1 The purpose of this book

2 The approach of this book: an example

Part I Foundations of data analysis

1 Model specification and applied research

1.1 Introduction

1.2 Model specification and statistical inference

1.3 The role of data in model specification: traditional modelling

1.4 The role of data in model specification: modern approaches

1.5 The time dimension in data

1.6 Summary of main points

2 Modelling an average

2.1 Introduction

2.2 Kinds of averages

2.3 The assumptions of the model

2.4 The sample mean as best linear unbiased estimator (BLUE)

2.5 Normality and the maximum likelihood principle

2.6 Inference from a sample of a normal distribution

2.7 Summary of main points

Appendix 2.1: Properties of mean and variance

Appendix 2.2: Standard sampling distributions

3 Outliers, skewness and data transformations

3.1 Introduction

3.2 The least squares principle and the concept of resistance

3.3 Mean-based versus order-based sample statistics

3.4 Detecting non-normality in data

3.5 Data transformations to eliminate skewness

3.6 Summary of main points

Part II Regression and data analysis

4 Data analysis and simple regression

4.1 Introduction

4.2 Modelling simple regression

4.3 Linear regression and the least squares principle

4.4 Inference from classical normal linear regression model

4.5 Regression with graphics: checking the model assumptions

4.6 Regression through the origin

4.7 Outliers, leverage and influence

4.8 Transformation towards linearity

4.9 Summary of main points

5 Partial regression: interpreting multiple regression coefficients

5.1 Introduction

5.2 The price of food and the demand for manufactured goods in India

5.3 Least squares and the sample multiple regression line

5.4 Partial regression and partial correlation

5.5 The linear regression model

5.6 The *t*-test in multiple regression

5.7 Fragility analysis: making sense of regression coefficients

5.8 Summary of main points

6 Model selection and misspecification in multiple regression

6.1 Introduction

6.2 Griffin's aid versus savings model: the omitted variable bias

6.3 Omitted variable bias: the theory

6.4 Testing zero restrictions

6.5 Testing non-zero linear restrictions

6.6 Tests of parameter stability

6.7 The use of dummy variables

6.8 Summary of main points

Part III Analysing cross-section data

7 Dealing with heteroscedasticity

7.1 Introduction

7.2 Diagnostic plots: looking for heteroscedasticity

7.3 Testing for heteroscedasticity

7.4 Transformations towards homoscedasticity

7.5 Dealing with genuine heteroscedasticity: weighted least squares and heteroscedastic standard errors

7.6 Summary of main points

8 Categories, counts and measurements

8.1 Introduction

8.2 Regression on a categorical variable: using dummy variables

8.3 Contingency tables: association between categorical variables

8.4 Partial association and interaction

8.5 Multiple regression on categorical variables

8.6 Summary of main points

9 Logit transformation, modelling and regression

9.1 Introduction

9.2 The logit transformation

9.3 Logit modelling with contingency tables

9.4 The linear probability model versus logit regression

9.5 Estimation and hypothesis testing in logit regression

9.6 Graphics and residual analysis in logit regression

9.7 Summary of main points

Part IV Regression with time-series data

10 Trends, spurious regressions and transformations to stationarity

10.1 Introduction

10.2 Stationarity and non-stationarity

10.3 Random walks and spurious regression

10.4 Testing for stationarity

10.5 Transformations to stationarity

10.6 Summary of main points

Appendix 10.1: Generated DSP and TSP series for exercises

11 Misspecification and autocorrelation

11.1 Introduction

11.2 What is autocorrelation and why is it a problem?

11.3 Why do we get autocorrelation?

11.4 Detecting autocorrelation

11.5 What to do about autocorrelation

11.6 Summary of main points

Appendix 11.1: Derivation of variance and covariance for AR(1) model

12 Cointegration and the error correction model

12.1 Introduction

12.2 What is cointegration?

12.3 Testing for cointegration

12.4 The error correction model (ECM)

12.5 Summary of main points

Part V Simultaneous equation models

13 Misspecification bias from single equation estimation

13.1 Introduction

13.2 Simultaneity bias in a supply and demand model

13.3 Simultaneity bias: the theory

13.4 The Granger and Sims tests for causality and concepts of exogeneity

13.5 The identification problem

13.6 Summary of main points

14 Estimating simultaneous equation models

14.1 Introduction

14.2 Recursive models

14.3 Indirect least squares

14.4 Instrumental variable estimation and two-stage least squares

14.5 Estimating the consumption function in a simultaneous system

14.6 Full information estimation techniques

14.7 Summary of main points

Appendix A: The data sets used in this book

Appendix B: Statistical tables

References

Index