Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Creating square matrix with obs counts from 2 non unique id variables
From 
 
Nick Cox <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: Creating square matrix with obs counts from 2 non unique id variables 
Date 
 
Tue, 19 Apr 2011 19:00:29 +0100 
Here is another way to do it. This uses -tabcount- (SSC).
levelsof ID1, local(l1)
levelsof ID2, local(l2)
local l : list l1 | l2
tabcount ID1 ID2 , v1(`l') v2(`l')
----------------------------
          |       ID2
      ID1 |   10    11    12
----------+-----------------
       10 |          1     1
       11 |    1           1
       12 |
----------------------------
. tabcount ID1 ID2 , v1(`l') v2(`l') matrix(foo)
----------------------------
          |       ID2
      ID1 |   10    11    12
----------+-----------------
       10 |          1     1
       11 |    1           1
       12 |
----------------------------
. mat li foo
foo[3,3]
    10  11  12
10   0   1   1
11   1   0   1
12   0   0   0
On Tue, Apr 19, 2011 at 5:44 PM, Nick Cox <[email protected]> wrote:
> This is ugly but it works.
>
> The idea is that you clone the data and then flip the variables round
> in the clone. That ensures a square matrix. You can take out the fake
> entries by rounding down as each was given a very small weight.
>
> Someone should be able to improve on this.
>
> clear
> input obs ID1 ID2
> 1     10  12
> 2     10  11
> 3     11  12
> 4     11  10
> end
> local N = _N
> local Np1 = _N + 1
> expand 2
> replace ID1 = ID2[_n - `N'] in `Np1'/L
> replace ID2 = ID1[_n - `N'] in `Np1'/L
> gen wt = cond(_n <= `N', 1, 1e-9)
>
> (1)
>
> collapse (sum) wt, by(ID1 ID2)
> replace wt = round(wt)
>
> OR
>
> (2)
>
> tab ID1 ID2 [iw=wt], matcell(foo)
> levelsof ID1, local(names)
> mata : st_matrix("foo", round(st_matrix("foo")))
> matrix rownames foo = `names'
> matrix colnames foo = `names'
>
> On Tue, Apr 19, 2011 at 4:47 PM, Diogo L Pinheiro
> <[email protected]> wrote:
>
>> I've been trying to create a square matrix with counts of observations from
>> two non unique ID variables. In this case, one id variable has the code for
>> an organization an individual was and one for where the individual is. For
>> example:
>>
>>
>> Observation    ID1         ID2
>> 1.                  10          12
>> 2.                  10          11
>> 3.                  11          12
>> 4.                  11          10
>>
>> In the example above, observation 1 was at organization #10 at one time and
>> organization #12 at another. Observation 2 was at 10 and then 11. And so on.
>>
>>
>> From that, I want to create a square matrix that would look something like:
>>
>>
>>              10           11           12
>>
>> 10           0              1            1
>> 11           1              0            1
>> 12           0              0            0
>>
>> So that I could have all the counts of each combination of organizations.
>>
>> The main problem is getting a square matrix in the end. I can create a non
>> square matrix in this case by using
>> tab ID2, gen(newID)
>> collapse (sum) newID, by(ID1)
>>
>> but the problem is that this won't be a square matrix, as not every ID
>> number shows up in both variables.. In the example above it would be missing
>> a line (since not every ID number present in ID1 is present in ID2, and vice
>> versa).
>>
>> So how could I get a square matrix from the situation above? I've found
>> commands to create similar matrices, but they require unique IDs and only
>> provide binary results for the existing combinations.
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/