Geographically weighted regression in Stata

Speaker:  Mark S. Pearce, University of Newcastle upon Tyne

Geographically weighted regression is a method for exploring spatial nonstationarity. Spatial nonstationarity being a condition in which a simple "global" regression model cannot adequately explain the relationships between some sets of variables over a geographical area. Instead, the nature of the model should alter over space to reflect the structure within the data. For example does the risk of disease in relation to a risk factor remain constant across a geographical area, or is the relationship stronger at certain points within the area.

Brunsdon et al. (1996) developed geographically weighted regression, which attempts to capture this spatial variation by calibrating a multiple regression model which allows different relationships between variables to exist at different points in space.

The basic idea of geographically weighted regression is that a regression model is fitted at each point in the data, weighting all observations by a function of distance from that point. This corresponds to the idea that observations sampled near to the observation where the regression is centred have more influence on the resulting regression parameters at that point than observations further away. This then produces a set of parameter estimates at each point in the defined geographical area. These parameter estimates can then be mapped using GIS software to identify where the relationships between variables vary, providing a useful form of exploratory analysis. Using Monte Carlo methods 2 hypothesis tests can be carried out:

  1. that the data may be described by a global model rather than a nonstationary one.
  2. whether individual regression coefficients are stable over geographic space.

I will present how this method can be carried out in Stata using the ado files gwr and gwrgrid which both apply geographically weighted regression to a dataset containing geographical reference points. The only difference between the two ado files being that gwrgrid places a grid over the geographical area and carries out regressions centred at each grid centroid, whereas gwr carries out regressions centred at each point in the data.

The code in these ado files is based on the paper by Brunsdon et al., and a FORTRAN program written by Brunsdon et al., and has been extended to any form of generalized linear model by relying heavily on the existing glm function in Stata.

The technique and programs, and the options included with them, will be demonstrated on the example given by Brunsdon et al. — a ward-level dataset from the 1991 UK census relating car ownership rates to social class and male unemployment in the county of Tyne & Wear in north-east England.

Reference

Brunsdon, C., A. S. Fotheringham, and M. E. Charlton. 1996.
Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis 28: 281–298.