Spatial autoregressive models

Order

Watch video demo

<- See Stata's other features

Highlights

Spatial lags of:
- Dependent variable
- Independent variables
- Autoregressive errors
Endogenous covariates
Heteroskedastic errors
Cross-sectional data
Panel data
- Fixed-effects models
- Random-effects models
Analyze spillover
- Direct effects
- Indirect effects
Spatial weighting matrices
- Inverse distance
- Contiguity—nearest neighbor
- Custom
- Choice of normalization: spectral, minmax, row, more
Import shapefiles

Neighboring towns have more influence on each other than on towns far away. The same is true of countries that are close to each other and of closely connected friends on social media.

Spatial autoregressive models are fit using datasets that contain observations on geographical areas. Observations are called spatial units and might be countries, states, counties, postal codes, or city blocks. Alternatively, they might not be geographically based at all; they could be nodes of a social network.

Datasets contain a continuous outcome variable—such as incidence of disease, output of farms, or crime rate—along with other variables to predict the outcome. For cross-sectional data, each variable has one value per spatial unit. For panel data, there are typically multiple values for different time points.

There is a manual entirely devoted to fitting SAR models, working with spatial data, and creating and managing spatial weighting matrices. The commands are called the Sp commands. See the Spatial Autoregressive Models Reference Manual.

Let's see it work

There are three steps to fitting SAR models:

Getting your data ready for analysis.
Creating the spatial weighting matrices your model needs.
Running your SAR model.

Stata's Sp commands will work with or without shapefiles, files commonly used to define maps. They will work with other location data or even work with data without locations at all, such as social network data.

Here's an analysis from start to finish.

Step 1. Find and translate shapefile

You do not have to use shapefiles, but we will. We downloaded a standard-format shapefile, tl_2016_us_county.zip, that we found at https://catalog.data.gov/dataset/tiger-line-shapefile-2016-nation-u-s-current-county-and-equivalent-national-shapefile.

We copied the file to our current directory, then typed in Stata:

. unzipfile tl_2016_us_county.zip
(output omitted)

. spshape2dta tl_2016_us_county
(importing .shp file)
(importing .dbf file)
(creating _ID spatial-unit id)
(creating _CX coordinate)
(creating _CY coordinate)

file tl_2016_us_county_shp.dta created
file tl_2016_us_county.dta     created

spshape2dta did its magic and created two Stata datasets for us. One is a Stata-format shapefile:

tl_2016_us_county_shp.dta

The other is a Stata dataset containing the other data that were in the shapefile bundle:

tl_2016_us_county.dta

spshape2dta also linked the two files.

Step 2: Prepare the data for analysis

We have our own analysis data for these counties in texas_ue.dta. We could just use them and skip the shapefile, but our data do not have the coordinates of the counties. We could not calculate distances or find neighbors. We could not do an SAR analysis. That's why we got the shapefile from the U.S. Census; it will provide all of this information.

We will merge our data with tl_2016_us_county.dta. We will first create an ID variable for merging the files. We also tell Sp that the Census provided the coordinates in latitude and longitude and that we want distances reported in miles.

. use tl_2016_us_county

. generate long fips = real(STATEFP + COUNTYFP)

. spset fips, modify replace
(output omitted)

. spset, modify coordsys(latlong, miles)

      Sp dataset: tl_2016_us_county.dta
Linked shapefile: tl_2016_us_county_shp.dta
            Data: Cross sectional
 Spatial-unit ID: _ID (equal to fips)
     Coordinates: _CY, _CX (latitude-and-longitude, miles)

. save, replace
file tl_2016_us_county.dta saved

. use texas_ue, clear

. merge 1:1 fips using tl_2016_us_county, keep(match)
(output omitted)

Our data are ready for analysis.

Step 3: Creating the spatial weighting matrices

We plan on fitting a model with spatial lags of the dependent variable, spatial lags of a covariate, and spatial autoregressive errors. Spatial lags are defined by spatial weighting matrices. We will use one matrix for the variables and another for the errors.

Sp provides many ways to create spatial weighting matrices. We will use just two of its predefined formulations:

. spmatrix create contiguity W

. spmatrix create idistance M

. spmatrix dir



   Weighting matrix name           N x N      Type         Normalization

                       M       254 x 254    idistance        spectral
                       W       254 x 254    contiguity       spectral

We created W to be a contiguity matrix based on nearest neighbors.

We created M to be the inverse of the distance between counties.

We let Sp perform its default normalization, which is spectral (largest eigenvalue). We could have chosen row or min–max normalization.

Sp also provides commands that let you create custom weighting matrices. You can create them from Stata data by writing Mata code or by importing them from a text file. The Mata capability is of special interest because it is so easy to use.

Step 4: Fitting the model

We have prepared our data.

We have defined the spatial weighting matrices we need.

We can now fit our model.

. spregress unemployment college, gs2sls dvarlag(W) ivarlag(W:college) errorlag(M)

  (254 observations)
  (254 observations (places) used)
  (weighting matrices define 254 places)

Estimating rho using 2SLS residuals:

initial:       GMM criterion =  .00565316
alternative:   GMM criterion =  .00235416
rescale:       GMM criterion =  .00004209
Iteration 0:   GMM criterion =  .00004209
Iteration 1:   GMM criterion =  7.292e-06
Iteration 2:   GMM criterion =  7.195e-06

Estimating rho using GS2SLS residuals:

Iteration 0:   GMM criterion =  .01457475
Iteration 1:   GMM criterion =   .0120462
Iteration 2:   GMM criterion =   .0118037
Iteration 3:   GMM criterion =  .01180212
Iteration 4:   GMM criterion =  .01180014
Iteration 5:   GMM criterion =   .0117996
Iteration 6:   GMM criterion =  .01179915
Iteration 7:   GMM criterion =  .01179899
Iteration 8:   GMM criterion =  .01179888
Iteration 9:   GMM criterion =  .01179883

Spatial autoregressive model                    Number of obs     =        254
GS2SLS estimates                                Wald chi2(3)      =      43.52
                                                Prob > chi2       =     0.0000
                                                Pseudo R2         =     0.2081



unemployment        Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

unemployment  
     college     -.067419   .0122785    -5.49   0.000    -.0914843   -.0433536
       _cons     5.715733   .4246597    13.46   0.000     4.883415     6.54805

W             
     college    -.0424388   .0213227    -1.99   0.047    -.0842306    -.000647
unemployment     .2058481   .0961201     2.14   0.032     .0174562      .39424

W             
e.unemploy~t     3.247298   1.369204     2.37   0.018     .5637078    5.930888

Wald test of spatial terms:          chi2(3) = 9.77       Prob > chi2 = 0.0206

We used the generalized spatial two-stage least-squares (GS2SLS) estimator. The GS2SLS estimator lets us fit multiple spatial lags, potentially allowing us to better approximate the true spatial dependence. Alternatively, we could have used a maximum likelihood estimator to fit the model.

Our dependent variable is the unemployment rate (unemployment) and we think that unemployment is affected by the proportion of the adult population who hold college degrees (college).

Step 5: Interpreting the model

If you are familiar with SAR models, you know they are difficult to interpret because the coefficients are a combination of direct and indirect effects. Direct effects are the effects of the spatial unit on itself. Indirect effects are the effects spatial units have on other spatial units, also known as spillover effects.

Stata's estat impact command splits out those effects.

. estat impact

direct     :100%
indirect   :100%
total      :100%

Average impacts                                 Number of obs     =        254



                          Delta-Method 
                    dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]

direct        
     college    -.0690788   .0121434    -5.69   0.000    -.0928795   -.0452781

indirect      
     college    -.0586373   .0299383    -1.96   0.050    -.1173152    .0000407

total         
     college    -.1277161   .0320147    -3.99   0.000    -.1904638   -.0649684

estat impact is essential for interpreting the results of SAR models and works after all Sp estimators. We see that a 1-percentage point increase in those holding college degrees in a county reduces unemployment by 0.07 percentage points in that same county, the direct effect. The spillover effect to neighboring counties is almost as large—a 0.06 percentage point expected reduction in unemployment.

Tell me more

Learn more about Stata's spatial autoregressive models features.

There is an entire manual dedicated to SAR, and it has friendly introductions to the subject. See the Spatial Autoregressive Models Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


Weighting matrix name N x N Type Normalization

M 254 x 254 idistance spectral
W 254 x 254 contiguity spectral


unemployment		Coef. Std. Err. z P>\|z\| [95% Conf. Interval]

unemployment
college		-.067419 .0122785 -5.49 0.000 -.0914843 -.0433536
_cons		5.715733 .4246597 13.46 0.000 4.883415 6.54805

W
college		-.0424388 .0213227 -1.99 0.047 -.0842306 -.000647
unemployment		.2058481 .0961201 2.14 0.032 .0174562 .39424

W
e.unemploy~t		3.247298 1.369204 2.37 0.018 .5637078 5.930888

Wald test of spatial terms: chi2(3) = 9.77 Prob > chi2 = 0.0206


		Delta-Method
		dy/dx Std. Err. z P>\|z\| [95% Conf. Interval]

direct
college		-.0690788 .0121434 -5.69 0.000 -.0928795 -.0452781

indirect
college		-.0586373 .0299383 -1.96 0.050 -.1173152 .0000407

total
college		-.1277161 .0320147 -3.99 0.000 -.1904638 -.0649684