Stata | FAQ: Neighbors on a rectangular grid

Home / Resources & support / FAQs / Neighbors on a rectangular grid

How do I identify neighbors of points or areas on a rectangular grid in Stata?

Title		Neighbors on a rectangular grid
Author		Nicholas J. Cox, Durham University, UK

Say you have spatial data for points or areas on a rectangular grid, meaning a grid with a rectangular mesh, and not necessarily one whose overall shape is rectangular. Some minor trickery with by: allows you to identify the neighbors of each point and thus be able to carry out some simple spatial processing. Even if you are not especially interested in spatial data, the answer provides good examples of the use of by:, so read on.

Let us imagine a set-up like this:

		      x coordinates increasing to right
		      
		                * *
		              * * * * * *
		          * * * * * * * * * * * * * *
		        * * * * * * * * * * * * * * * *
		      * * * * * * * * * * * * * * * * * * 
      y       	      * * * * * * * * * * * * * * * * * * * 
      coordinates     * * * * * * * * * * * * * * * * * * * 
      increasing      * * * * * * * * * * * * * * * * * * * * 
      to              * * * * * * * * * * * * * * * * * * * * * 
      top             * * * * * * * * * * * * * * * * * * * * * 
		      * * * * * * * * * * * * * * * * * * * * 
	            * * * * * * * * * * * * * * * * * * * *
	            * * * * * * * * * * * * * * * * * * *
	          * * * * * * * * * * * * * * * * * * * *
	            * * * * * * * * * * * * * * * * * * *
		      * * * * * * * * * * * * * * * * * *
		            * * * * * * * * * * * * * * 
		                      * * * * * *

We are going to assume each row is indexed by the same integer y coordinate and each column is indexed by the same integer x coordinate. As the diagram should suggest, we are not assuming that the overall shape of the grid is rectangular or even some regular geometric shape. For concreteness, we are assuming a Cartesian convention about directions in which coordinates increase. However, a matrix convention in which row numbers increase downwards is easily accommodated by minor changes to what follows.

In addition, we are going to skate over the difference between gridded point data (e.g., altitude or rainfall is measured at a notional point) and gridded area data (e.g., number of people is recorded in a rectangle). Although important otherwise, it is secondary to the question of identifying neighbors for which the principles are identical. The terminology of points will include areas.

Using a geographic convention, let us imagine the grid is oriented so that north is at the top, and we can identify neighbors of each point to the north, northeast, east, southeast, south, southwest, west and northwest.

Suppose we type

 . by y (x), sort:

After such a sort, points would be ordered first by y coordinate and then by x coordinate. Thus for any observation, the previous observation in Stata's memory is that to the immediate west, and the observation following is that to the immediate east. Given some variable z with information on what is happening at each x and y (altitude, number of people in grid rectangle, whatever),

 . by y (x), sort: generate z_w = z[_n-1] 
 . by y: generate z_e = z[_n+1]

However, what we just said is not quite true: it is incorrect for the beginnings and ends of rows. For example, the observation before that for the beginning of a row is for the previous row and is not its neighbor to the west. Nevertheless, this code needs no extra tricks to cope with edge problems. Because the first point in any row has no neighbor to the west, z_w for that point will be filled in with missing. This is ensured because everything is done under the aegis of by:. Processing takes place within the groups of observations defined by by:, and there can be no spurious carry-overs between rows.

Reversing the roles of the coordinates, we can get north and south neighbors:

 . by x (y), sort: generate z_s = z[_n-1] 
 . by x: generate z_n = z[_n+1]

The other four neighbors require one extra small trick. The NW and SE neighbors of any point lie on the diagonal through that point for which x + y is constant, and the NE and SW neighbors of any point lie on the other diagonal, for which x − y is constant. Given variables constructed for these diagonal sums, the rest is just a variation on the prevailing theme:

 . generate long xpy = x + y           
 . generate long xmy = x - y           
 . by xpy (x), sort: generate z_nw = z[_n-1] 
 . by xpy: generate z_se = z[_n+1] 
 . by xmy (x), sort: generate z_sw = z[_n-1] 
 . by xmy (x): generate z_ne = z[_n+1]

That is it, really. We were a little paranoid in insisting on a long variable type, just in case the sums (in particular) were very large. Quite a lot of elementary spatial analysis is now in reach just by using the constructed variables (and without any looping over observations or special programming). For example, you can get an average of NEWS neighbors and one of NW, NE, SW, SE neighbors by

 . egen z_news = rowmean(z_n z_e z_w z_s)
 . egen z_nwneswse = rowmean(z_nw z_ne z_sw z_se)

The egen function rowmean() and its siblings are especially recommended here for doing what you would usually regard as the right thing whenever one or more variables are missing.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

How do I identify neighbors of points or areas on a rectangular grid in Stata?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How do I identify neighbors of points or areas on a rectangular grid in Stata?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies