Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Creating square matrix with obs counts from 2 non unique id variables |
Date | Tue, 19 Apr 2011 19:07:52 +0100 |
There was a write-up of -tabcount- in SJ-3-4 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q4/03 SJ 3(4):420--439 (no commands) reviews three user-written commands (tabcount, makematrix, and groups) as different approaches to tabulation problems On Tue, Apr 19, 2011 at 7:00 PM, Nick Cox <njcoxstata@gmail.com> wrote: > Here is another way to do it. This uses -tabcount- (SSC). > > levelsof ID1, local(l1) > levelsof ID2, local(l2) > local l : list l1 | l2 > tabcount ID1 ID2 , v1(`l') v2(`l') > > ---------------------------- > | ID2 > ID1 | 10 11 12 > ----------+----------------- > 10 | 1 1 > 11 | 1 1 > 12 | > ---------------------------- > > . tabcount ID1 ID2 , v1(`l') v2(`l') matrix(foo) > > ---------------------------- > | ID2 > ID1 | 10 11 12 > ----------+----------------- > 10 | 1 1 > 11 | 1 1 > 12 | > ---------------------------- > > . mat li foo > > foo[3,3] > 10 11 12 > 10 0 1 1 > 11 1 0 1 > 12 0 0 0 > > > > On Tue, Apr 19, 2011 at 5:44 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> This is ugly but it works. >> >> The idea is that you clone the data and then flip the variables round >> in the clone. That ensures a square matrix. You can take out the fake >> entries by rounding down as each was given a very small weight. >> >> Someone should be able to improve on this. >> >> clear >> input obs ID1 ID2 >> 1 10 12 >> 2 10 11 >> 3 11 12 >> 4 11 10 >> end >> local N = _N >> local Np1 = _N + 1 >> expand 2 >> replace ID1 = ID2[_n - `N'] in `Np1'/L >> replace ID2 = ID1[_n - `N'] in `Np1'/L >> gen wt = cond(_n <= `N', 1, 1e-9) >> >> (1) >> >> collapse (sum) wt, by(ID1 ID2) >> replace wt = round(wt) >> >> OR >> >> (2) >> >> tab ID1 ID2 [iw=wt], matcell(foo) >> levelsof ID1, local(names) >> mata : st_matrix("foo", round(st_matrix("foo"))) >> matrix rownames foo = `names' >> matrix colnames foo = `names' >> >> On Tue, Apr 19, 2011 at 4:47 PM, Diogo L Pinheiro >> <diogo.pinheiro@pubpolicy.gatech.edu> wrote: >> >>> I've been trying to create a square matrix with counts of observations from >>> two non unique ID variables. In this case, one id variable has the code for >>> an organization an individual was and one for where the individual is. For >>> example: >>> >>> >>> Observation ID1 ID2 >>> 1. 10 12 >>> 2. 10 11 >>> 3. 11 12 >>> 4. 11 10 >>> >>> In the example above, observation 1 was at organization #10 at one time and >>> organization #12 at another. Observation 2 was at 10 and then 11. And so on. >>> >>> >>> From that, I want to create a square matrix that would look something like: >>> >>> >>> 10 11 12 >>> >>> 10 0 1 1 >>> 11 1 0 1 >>> 12 0 0 0 >>> >>> So that I could have all the counts of each combination of organizations. >>> >>> The main problem is getting a square matrix in the end. I can create a non >>> square matrix in this case by using >>> tab ID2, gen(newID) >>> collapse (sum) newID, by(ID1) >>> >>> but the problem is that this won't be a square matrix, as not every ID >>> number shows up in both variables.. In the example above it would be missing >>> a line (since not every ID number present in ID1 is present in ID2, and vice >>> versa). >>> >>> So how could I get a square matrix from the situation above? I've found >>> commands to create similar matrices, but they require unique IDs and only >>> provide binary results for the existing combinations. >> > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/