home workflow spillover spmatrix spregress spxtregress

Example from start to end

We are going to analyze unemployment rate in counties of Texas. We are going to use texas_ue.dta. The data contains unemployment rate and college graduation rate for Texas counties, but they do not include locations of the counties. We are going to

Find and download U.S counties shapefile.
Translate the downloaded file into Stata format.
Merge the translated file with our existing data.
Analyze the merged data.

Find and download U.S counties shapefile.

From any web browser, we search “shapefile U.S counties census” and found https://www.census.gov/geo/maps-data/data/tiger-line.html .

File tl_2017_us_county.zip is downloaded to Downloads directory on our computer.

Translate the downloaded file into Stata format

We now use unzipfile and spshape2dta to translate tl_2017_us_county.zip into Stata format.

. /*
>         Step 1 : move the download file to the working directory
> */
. copy ~/Downloads/tl_2017_us_county.zip .

. 
. /* 
>         Step 2 : unzip the files
> */
. unzipfile tl_2017_us_county.zip 
    inflating: tl_2017_us_county.cpg
    inflating: tl_2017_us_county.dbf
    inflating: tl_2017_us_county.prj
    inflating: tl_2017_us_county.shp
    inflating: tl_2017_us_county.shp.ea.iso.xml
    inflating: tl_2017_us_county.shp.iso.xml
    inflating: tl_2017_us_county.shp.xml
    inflating: tl_2017_us_county.shx

successfully unzipped tl_2017_us_county.zip to current directory
total processed:  8
        skipped:  0
      extracted:  8

. 
. /*
>         Step 3 : translate shapefile to Stata
> */
. spshape2dta tl_2017_us_county
  (importing .shp file)
  (importing .dbf file)
  (creating _ID spatial-unit id)
  (creating _CX coordinate)
  (creating _CY coordinate)

  file tl_2017_us_county_shp.dta created
  file tl_2017_us_county.dta     created

. 
. use tl_2017_us_county, clear

. describe

Contains data from tl_2017_us_county.dta
  obs:         3,233                          
 vars:            20                          1 Mar 2018 11:02
 size:       491,416                          
-------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
_ID             int     %12.0g                Spatial-unit ID
_CX             double  %10.0g                x-coordinate of area centroid
_CY             double  %10.0g                y-coordinate of area centroid
STATEFP         str2    %9s                   STATEFP
COUNTYFP        str3    %9s                   COUNTYFP
COUNTYNS        str8    %9s                   COUNTYNS
GEOID           str5    %9s                   GEOID
NAME            str21   %21s                  NAME
NAMELSAD        str33   %33s                  NAMELSAD
LSAD            str2    %9s                   LSAD
CLASSFP         str2    %9s                   CLASSFP
MTFCC           str5    %9s                   MTFCC
CSAFP           str3    %9s                   CSAFP
CBSAFP          str5    %9s                   CBSAFP
METDIVFP        str5    %9s                   METDIVFP
FUNCSTAT        str1    %9s                   FUNCSTAT
ALAND           double  %14.0f                ALAND
AWATER          double  %14.0f                AWATER
INTPTLAT        str11   %11s                  INTPTLAT
INTPTLON        str12   %12s                  INTPTLON
-------------------------------------------------------------------------------
Sorted by: _ID

. list _ID _CX _CY STATEFP COUNTYFP in 1/5

     +---------------------------------------------------+
     | _ID          _CX         _CY   STATEFP   COUNTYFP |
     |---------------------------------------------------|
  1. |   1     -96.7874   41.916403        31        039 |
  2. |   2   -123.43347   46.291134        53        069 |
  3. |   3   -104.41196   34.342414        35        011 |
  4. |   4   -96.687756   40.784174        31        109 |
  5. |   5   -98.047185    40.17638        31        129 |
     +---------------------------------------------------+

. 
. /*
>         Step 4 : create standard ID variable
> */
. generate long fips = real(STATEFP + COUNTYFP)

. bysort fips : assert _N == 1

. assert fips != .

. 
. /*
>         Step 5 : tell Sp to use standard ID variable
> */
. spset fips, modify replace
  (_shp.dta file saved)
  (data in memory saved)
  Sp dataset tl_2017_us_county.dta
                data:  cross sectional
     spatial-unit id:  _ID (equal to fips)
         coordinates:  _CX, _CY (planar)
    linked shapefile:  tl_2017_us_county_shp.dta

. 
. /*
>         Step 6 : Set coordinates units
> */
. spset, modify coordsys(latlong, miles)
  Sp dataset tl_2017_us_county.dta
                data:  cross sectional
     spatial-unit id:  _ID (equal to fips)
         coordinates:  _CY, _CX (latitude-and-longitude, miles)
    linked shapefile:  tl_2017_us_county_shp.dta

Merge the translated file with our existing data

Recall that we are going to use texas_ue containing unemployment rate and college graduation rate for Texas counties.

. copy http://www.stata-press.com/data/r15/texas_ue.dta .

. use texas_ue, clear


. describe

Contains data from texas_ue.dta
  obs:           254                          
 vars:             4                          10 Feb 2017 12:36
 size:         4,064                          (_dta has notes)
-------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
fips            long    %9.0g                 FIPS
college         float   %9.0g               * Percent college degree
income          long    %12.0g                Median household income
unemployment    float   %9.0g                 Unemployment rate
                                            * indicated variables have notes
-------------------------------------------------------------------------------
Sorted by: fips
     Note: Dataset has changed since last saved.

. 
. /*
>         merge the translated shapefile
> */
. merge 1:1 fips using tl_2017_us_county

    Result                           # of obs.
    -----------------------------------------
    not matched                         2,979
        from master                         0  (_merge==1)
        from using                      2,979  (_merge==2)

    matched                               254  (_merge==3)
    -----------------------------------------

. keep if _merge == 3
(2,979 observations deleted)

. drop _merge

. 
. save texas_ue, replace
file texas_ue.dta saved

Analyze data

. use texas_ue, clear

. /*
>         Step 1 : Is there spatial spillover ?
> */
. regress unemployment college

      Source |       SS           df       MS      Number of obs   =       254
-------------+----------------------------------   F(1, 252)       =     57.92
       Model |  139.314746         1  139.314746   Prob > F        =    0.0000
    Residual |  606.129539       252  2.40527595   R-squared       =    0.1869
-------------+----------------------------------   Adj R-squared   =    0.1837
       Total |  745.444285       253   2.9464201   Root MSE        =    1.5509

------------------------------------------------------------------------------
unemployment |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     college |  -.1008791   .0132552    -7.61   0.000    -.1269842   -.0747741
       _cons |   6.542796   .2571722    25.44   0.000     6.036316    7.049277
------------------------------------------------------------------------------

. spmatrix create contiguity W, replace

. estat moran, errorlag(W)

Moran test for spatial dependence
         Ho: error is i.i.d. 
         Errorlags:  W

         chi2(1)      =    94.06
         Prob > chi2  =   0.0000

. /*
>         Step 2 : estimation with spregress
> */
. spregress unemployment college, dvarlag(W) gs2sls
  (254 observations)
  (254 observations (places) used)
  (weighting matrix defines 254 places)

Spatial autoregressive model                    Number of obs     =        254
GS2SLS estimates                                Wald chi2(2)      =      67.66
                                                Prob > chi2       =     0.0000
                                                Pseudo R2         =     0.1453

------------------------------------------------------------------------------
unemployment |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
unemployment |
     college |  -.0939834   .0131033    -7.17   0.000    -.1196653   -.0683015
       _cons |   5.607379   .5033813    11.14   0.000     4.620769    6.593988
-------------+----------------------------------------------------------------
W            |
unemployment |   .2007728   .0942205     2.13   0.033      .016104    .3854415
------------------------------------------------------------------------------
Wald test of spatial terms:          chi2(1) = 4.54       Prob > chi2 = 0.0331

. 
. /*
>         Step 3 : interpretation of results
> */
. estat impact

progress   :100% 

Average impacts                                 Number of obs     =        254

------------------------------------------------------------------------------
             |            Delta-Method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
direct       |
     college |  -.0945245   .0130576    -7.24   0.000     -.120117   -.0689321
-------------+----------------------------------------------------------------
indirect     |
     college |  -.0195459    .010691    -1.83   0.068       -.0405    .0014081
-------------+----------------------------------------------------------------
total        |
     college |  -.1140705   .0171995    -6.63   0.000    -.1477808   -.0803602
------------------------------------------------------------------------------

We can have spatial lags for dependent variable, independent variables, and error terms. They have different interpretations. Here are some examples :

Examples with different spatial lags

spregress unemployment college, ivarlag(W : college) gsls2s
spregress unemployment college, errorlag(W) gsls2s
spregress unemployment college, errorlag(M) dvarlag(W) gsls2s
spregress unemployment college, dvarlag(W1) dvarlag(W2) gsls2s

Examples using ml estimator

spregress unemployment college, dvarlag(W) ml
spregress unemployment college, errorlag(W) ml
spregress unemployment college, ivarlag(W1: college) ivarlag(W2:college) ml

Examples with endogenous covariates

spivregress dui nodui vehicles i.dry (police = elect) , dvarlag(W) errorlag(M)

Examples with panel data

spxtregress hrate ln_population gini , fe dvarlag(W) errorlag(M)
spxtregress hrate ln_population gini , re dvarlag(W) errorlag(M)
spxtregress hrate ln_population gini , re sarpanel dvarlag(W) errorlag(M)