Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating square matrix with obs counts from 2 non unique id variables


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Creating square matrix with obs counts from 2 non unique id variables
Date   Tue, 19 Apr 2011 19:07:52 +0100

There was a write-up of -tabcount- in

SJ-3-4  pr0011  . . . . . . . .  Speaking Stata: Problems with tables, Part II
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/03   SJ 3(4):420--439                                 (no commands)
        reviews three user-written commands (tabcount, makematrix,
        and groups) as different approaches to tabulation problems


On Tue, Apr 19, 2011 at 7:00 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> Here is another way to do it. This uses -tabcount- (SSC).
>
> levelsof ID1, local(l1)
> levelsof ID2, local(l2)
> local l : list l1 | l2
> tabcount ID1 ID2 , v1(`l') v2(`l')
>
> ----------------------------
>          |       ID2
>      ID1 |   10    11    12
> ----------+-----------------
>       10 |          1     1
>       11 |    1           1
>       12 |
> ----------------------------
>
> . tabcount ID1 ID2 , v1(`l') v2(`l') matrix(foo)
>
> ----------------------------
>          |       ID2
>      ID1 |   10    11    12
> ----------+-----------------
>       10 |          1     1
>       11 |    1           1
>       12 |
> ----------------------------
>
> . mat li foo
>
> foo[3,3]
>    10  11  12
> 10   0   1   1
> 11   1   0   1
> 12   0   0   0
>
>
>
> On Tue, Apr 19, 2011 at 5:44 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>> This is ugly but it works.
>>
>> The idea is that you clone the data and then flip the variables round
>> in the clone. That ensures a square matrix. You can take out the fake
>> entries by rounding down as each was given a very small weight.
>>
>> Someone should be able to improve on this.
>>
>> clear
>> input obs ID1 ID2
>> 1     10  12
>> 2     10  11
>> 3     11  12
>> 4     11  10
>> end
>> local N = _N
>> local Np1 = _N + 1
>> expand 2
>> replace ID1 = ID2[_n - `N'] in `Np1'/L
>> replace ID2 = ID1[_n - `N'] in `Np1'/L
>> gen wt = cond(_n <= `N', 1, 1e-9)
>>
>> (1)
>>
>> collapse (sum) wt, by(ID1 ID2)
>> replace wt = round(wt)
>>
>> OR
>>
>> (2)
>>
>> tab ID1 ID2 [iw=wt], matcell(foo)
>> levelsof ID1, local(names)
>> mata : st_matrix("foo", round(st_matrix("foo")))
>> matrix rownames foo = `names'
>> matrix colnames foo = `names'
>>
>> On Tue, Apr 19, 2011 at 4:47 PM, Diogo L Pinheiro
>> <diogo.pinheiro@pubpolicy.gatech.edu> wrote:
>>
>>> I've been trying to create a square matrix with counts of observations from
>>> two non unique ID variables. In this case, one id variable has the code for
>>> an organization an individual was and one for where the individual is. For
>>> example:
>>>
>>>
>>> Observation    ID1         ID2
>>> 1.                  10          12
>>> 2.                  10          11
>>> 3.                  11          12
>>> 4.                  11          10
>>>
>>> In the example above, observation 1 was at organization #10 at one time and
>>> organization #12 at another. Observation 2 was at 10 and then 11. And so on.
>>>
>>>
>>> From that, I want to create a square matrix that would look something like:
>>>
>>>
>>>              10           11           12
>>>
>>> 10           0              1            1
>>> 11           1              0            1
>>> 12           0              0            0
>>>
>>> So that I could have all the counts of each combination of organizations.
>>>
>>> The main problem is getting a square matrix in the end. I can create a non
>>> square matrix in this case by using
>>> tab ID2, gen(newID)
>>> collapse (sum) newID, by(ID1)
>>>
>>> but the problem is that this won't be a square matrix, as not every ID
>>> number shows up in both variables.. In the example above it would be missing
>>> a line (since not every ID number present in ID1 is present in ID2, and vice
>>> versa).
>>>
>>> So how could I get a square matrix from the situation above? I've found
>>> commands to create similar matrices, but they require unique IDs and only
>>> provide binary results for the existing combinations.
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index