Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: spatial weighting matrix


From   Mike Lacy <Michael.Lacy@colostate.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: spatial weighting matrix
Date   Thu, 09 Apr 2009 14:40:31 -0600


Max r wrote:
>
>Date: Wed, 8 Apr 2009 15:08:58 -0400
>From: max r <maxr28@gmail.com>
>Subject: Re: st: spatial weighting matrix
>
>Kyle and Nick,
>
>Thank you for the suggestions. This is what I have done since the last
>email. Some one pointed me towards a spatial analysis software "GEODA"
>- - it's free, and does spatially weighted regressions. ...I have linked
>the cross sectional version of my panel to a parcel map, after some
>tinkering around, was able to use this software to create queen
>contiguity weighting matrix. The weighting matrix is stored ".gal" or
>".gwt" format , I currently wondering how to export the information
>about neighbors back into STATA.
>

I recently dealt with something similar using Stata and Geoda, and will
describe my approach, FWIW to Max and others.
In addition I have a question, posed after my description, regarding
alternatives to -reshape- wide for very long files.

So, speaking to Max's issue: The Geoda .gal or .gwt matrices *can* be brought into Stata. They are just plain ASCII files, with a slightly messy record structure,
fairly well described in the Geoda documentation. They vary in structure,
but in general they *are* things that can be read into Stata and used.

Here's an HLA for jhow I turned a *.gal file into a binary weighting matrix,
where Wij = 1 if areal unit i and areal unit j are neighbors, 0 otherwise.


0) Read the Geoda weight file into Stata and massage it into
a wide format, looking like this:

egoid  alterid1 alterid2 ...    alteridMax

where egoid is the id of the areal unit of interest, and the alterid* variable
contain ids of each unit that is a neighbor of ego. Max is the maximum number of
neighbors any areal unit had, so that alteridQ is missing for if ego
has less than Q neighbors.


1) Use -reshape- long to make the preceding Stata data structure into a pairwise
edge structure of all pairs of units that are contiguous.

egoid alterid weight
......        1
......
(Weights are 1 for all these cases, which are contiguous.)

2) Use -fillin- to generate additional records with all the other
ego X alter combinations (These all the noncontiguous pairs, which
do not appear in the *.gal file.) These cases all have weight = 0.

egoid alterid weight
......        1
.....         0

etc.

(N X N cases of pairs)

(One detail to watch out for is that all possible egos and alters may
not appear in the original file, since units with no neighbors are
missing from the original  file.)


3) At this point, a -reshape- wide will create the desired
 N X N binary weight file with structure as:

 egoid WtForAlterid1, WtAlterid2, ..., WtForAlteridN
         1         0          ...

 -----------done.



Now, for my question and a request re item 3):


I did not actually use -reshape- for step 3, since my pairwise file had about
9e6 pairs, to be reshaped to 3000 X 3000.  My experience is that this
reshape is very slow for large N. ( For N units, I think the increase in time to run
-reshape- increases at more than O(N^2) at large N._)

Instead I -outsheet- ed the data, and processed it with a low level language
program.

I'd be interested in better alternatives to -reshape- for large problems like
step 3. Would a Mata program for this special but common case be worthwhile, in
terms of speed?

If so, I'd encourage someone to create one.  Turning pairwise edge data into
an N X N transaction matrix is a pretty common problem for people who deal
with network or other transaction data.

Or, is there some non-Mata but quicker Stata approach than -reshape-?

By the way: Stata is quite tolerant of large weight matrices (e.g. 3000 X 3000), even in v. 9 with my modest Wintel machine, and matrix operations with it are quite fast.
However, -mkmat- is quite slow with a 3000 X 3000 data file.


Regards-- and pardon my longwindedness,


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index