Stata
Products Purchase Support Company
Search
   >> Home >> Resources & support >> FAQs >> Neighbors on a rectangular grid

How do I identify neighbors of points or areas on a rectangular grid in Stata?

Title   Neighbors on a rectangular grid
Author Nicholas J. Cox, Durham University, UK
Date October 2004; minor revisions March 2007

Say that you have spatial data for points or areas on a rectangular grid, meaning a grid with a rectangular mesh, and not necessarily one whose overall shape is rectangular. Some minor trickery with by: allows you to identify the neighbors of each point and thus be able to carry out some simple spatial processing. Even if you are not especially interested in spatial data, the answer provides good examples of the use of by:, so do read on.

Let us imagine a set-up like this:

		      x coordinates increasing to right
		      
		                * *
		              * * * * * *
		          * * * * * * * * * * * * * *
		        * * * * * * * * * * * * * * * *
		      * * * * * * * * * * * * * * * * * * 
      y       	      * * * * * * * * * * * * * * * * * * * 
      coordinates     * * * * * * * * * * * * * * * * * * * 
      increasing      * * * * * * * * * * * * * * * * * * * * 
      to              * * * * * * * * * * * * * * * * * * * * * 
      top             * * * * * * * * * * * * * * * * * * * * * 
		      * * * * * * * * * * * * * * * * * * * * 
	            * * * * * * * * * * * * * * * * * * * *
	            * * * * * * * * * * * * * * * * * * *
	          * * * * * * * * * * * * * * * * * * * *
	            * * * * * * * * * * * * * * * * * * *
		      * * * * * * * * * * * * * * * * * *
		            * * * * * * * * * * * * * * 
		                      * * * * * * 
				

We are going to assume that each row is indexed by the same integer y coordinate and each column is indexed by the same integer x coordinate. As the diagram should suggest, we are not assuming that the overall shape of the grid is rectangular or even some regular geometric shape. For concreteness, we are assuming a Cartesian convention about directions in which coordinates increase. However, a matrix convention in which row numbers increase downwards is easily accommodated by minor changes to what follows.

In addition, we are going to skate over the difference between gridded point data (e.g., altitude or rainfall is measured at a notional point) and gridded area data (e.g., number of people is recorded in a rectangle). Although important otherwise, that is secondary to the question of identifying neighbors, for which the principles are identical. The terminology of points will include areas.

Using a geographic convention, let us imagine that the grid is oriented so that north is at the top, and we can identify neighbors of each point to the north, northeast, east, southeast, south, southwest, west and northwest.

Suppose that we were to type

 . by y (x), sort: 

After such a sort, points would be ordered first by y coordinate and then by x coordinate. Thus for any observation, the previous observation in Stata's memory is that to the immediate west, and the observation following is that to the immediate east. Given some variable z with information on what is happening at each x and y (altitude, number of people in grid rectangle, whatever),

 . by y (x), sort: generate z_w = z[_n-1] 
 . by y: generate z_e = z[_n+1] 

However, what we just said is not quite true: it is incorrect for the beginnings and ends of rows. For example, the observation before that for the beginning of a row is for the previous row and is not its neighbor to the west. Nevertheless, this code needs no extra tricks to cope with edge problems. Because the first point in any row has no neighbor to the west, z_w for that point will be filled in with missing. This is all ensured because everything is being done under the aegis of by:. Processing takes place within the groups of observations defined by by: and there can be no spurious carry-overs between rows.

Reversing the roles of the coordinates, we can get north and south neighbors:

 . by x (y), sort: generate z_s = z[_n-1] 
 . by x: generate z_n = z[_n+1] 

The other four neighbors require one extra small trick. The NW and SE neighbors of any point lie on the diagonal through that point for which x + y is constant, and the NE and SW neighbors of any point lie on the other diagonal, for which x − y is constant. Given variables constructed for these diagonal sums, the rest is just a variation on the prevailing theme:

 . generate long xpy = x + y           
 . generate long xmy = x - y           
 . by xpy (x), sort: generate z_nw = z[_n-1] 
 . by xpy: generate z_se = z[_n+1] 
 . by xmy (x), sort: generate z_sw = z[_n-1] 
 . by xmy (x): generate z_ne = z[_n+1] 

That is it, really. We were a little paranoid in insisting on a long variable type, just in case the sums (in particular) were very large. Quite a lot of elementary spatial analysis is now in reach just by using the constructed variables (and without any looping over observations or special programming). For example, you can get an average of NEWS neighbors and one of NW, NE, SW, SE neighbors by

 . egen z_news = rowmean(z_n z_e z_w z_s)
 . egen z_nwneswse = rowmean(z_nw z_ne z_sw z_se)

The egen function rowmean() and its siblings are especially recommended here for doing what you would usually regard as the right thing whenever one or more variables are missing.

FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Macintosh
Technical support
Resources & support
FAQs
Technical support
NetCourses
Short courses
Users Group meetings
Statalist
Links
Software updates
Software archives
Customer service
Manuals & supplements
Stata Journal
STB
Stata News
Stata Automation
Plugins

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2008 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index