Re: st: Assigning new values to group variables

 From Florian Seliger To statalist@hsphsun2.harvard.edu Subject Re: st: Assigning new values to group variables Date Wed, 11 May 2011 14:41:49 -0700

```Dear Robert,

it took me a while to understand the logic behind your code, but it seems to work perfectly.

Thank you very much!

Am 09.05.2011 um 08:23 schrieb Robert Picard:

There are many issues here but I assume that you want to preserve the
relationship found in each observation. The following example creates
a variable called rel_id that identifies each relationship. Your main
issue of having consistent Group values is done by converting the data
to long form. Then I create a new variable called gid that identifies
groups of companies based on the relationships stated in the initial
dataset. This requires a program of mine called -group_id-, available
> from SSC. Just in case you needed it, I convert back to wide form.
>
> Hope this helps,
>
> Robert
>
* --------------------- begin example ---------------------
> clear
> input Group1 str10 Var1 Group2 str10 Var2
> 1 companyABC 1 companyABD
> 1 companyABC . .
> 2 companyABD . .
> 3 companyABE . .
> 4 companyABF 2 companyCCC
> 5 companyACF 3 companyDDD
> 6 companyACG . .
> end
>
> * Assign a unique identifier to each observation
> * These identify a relationship
>
> gen rel_id = _n
>
* Reshape to long form; drop obs with no company
>
> reshape long Group Var, i(rel_id) j(j)
> drop if Var == "."
>
* Disregard Group values if they are not Group1
>
> replace Group = . if j > 1
>
* Each company should have the same Group value
>
> sort Var Group
> by Var: replace Group = Group[1]
>
* Assign new Group values for companies that were
* not part of Group1
> * not part of Group1
>
> by Var: gen first = _n == 1
> sum Group, meanonly
> replace Group = r(max) + sum(first) if Group == .
> drop first
>
> * Group co_id when they are part of the same
> * relationship. This requires -group_id-, available
> * from SSC. To install, type ssc install group_id
>
> gen gid = Group
> group_id gid, matchby(rel_id)
> sort gid Var
> list, sepby(gid) noobs
>
* If desired, convert back to wide
>
> sort rel_id
> reshape wide Var Group gid, i(rel_id) j(j)
> list, noobs sep(0)
* --------------------- end example -----------------------
>
>
>
>
On Mon, May 9, 2011 at 7:35 AM, Florian Seliger <florian.seliger@gmx.de> wrote:
>> Dear Stalalist,
>>
>> I have a dataset from a firm survey containing several thousand observations.
>>
There are six variables with company names (Var1-Var6) where firms are asked to indicate to which other firms they have relationships.
>>
Similar companies may occur within Var1-Var6. These are grouped as indicated by the variables group1-group6.
>>
Var2-Var6 contain many missing values because many firms answer to have only a relationship to a single firm.
>>
The variables group1-group6 have different numbers although the companies are the same in var1 and var2 (and var3…), e.g., group1 may take on value 2 whereas group2 takes on value 1 for the same company. The problem is that there may also occur other companies in var2-var6 than in var1.

Please see the example below for a few companies.
>>
>> Please see the example below for a few companies.
>>
>>
>>
Group1          Var1                       Group2          Var2
>>
1                     companyABC            1                  companyABD
>>
1                     companyABC            .                       .
>>
2                     companyABD            .                       .
>>
3                     companyABE            .                       .
>>
4                     companyABF            2                  companyCCC
>>
5                     companyACF            3                  companyDDD
>>
6                     companyACG            .                       .
>>
>>
>>
>>
>>
>>
>>
At the end, all similar companies across Var1-Var6 should have the same value as in group1. In addition, companies that do not occur in Var1 should be assigned another number. Please look below for an example.
>>
>>
>>
>>
>>
Group1          Var1                        Group2          Var2
>>
1                     companyABC            1                     .
>>
1                     companyABC            1                     .
>>
2                     companyABD            2                   companyABD
>>
3                     companyABE            3                     .
>>
4                     companyABF            4                     .
>>
5                     companyACF            5                     .
>>
6                     compaynACG            6                     .
>>
6                     companyACG            6                     .
>>
>>
>>
>>
9                     .                     9                   companyCCC
>>
10                   .                      10                  companyDDD
>>
11                   .                      11                  companyCCD
>>
>>
>>
As I did not find the right approach to assign new numbers with STATA if a company does not occur in var1, I would like to ask you if you have any ideas.

Thank you.

Best,

Florian
>>
>>
>>
>> Thank you.
>>
>>
>>
>> Best,
>>
>> Florian
```