st: Assistance with manipulating a social network dataset?

Brandon Olszewski
st: Assistance with manipulating a social network dataset?
Tue, 11 Oct 2011 15:05:53 -0700

Hi Statalist:
I have a social network dataset, and I can’t figure out how to perform
the proper manipulations. People in rows were asked if they know
people listed in columns. In cells, “0” indicates two people don’t
know each other, and “1” indicates otherwise. So what I have looks
like this:
            Adam Beth  Charlie
Adam    1       1       0
Beth            0       1       0
Charlie 0       1       1

Note that while Adam claims to know Beth, Beth doesn’t claim the same,
and while Beth says she doesn’t know Charlie, he says otherwise. For
my purposes, I want to assume that if anyone says they know someone
else, to treat it as a “1” both ways.

The software I want to use (Sonoma) wants the data in one of two
formats. Here’s the wide option, which offers only one half the
matrix, with “1” coded in the diagonal and “.” coded in the bottom
half, with max values for combinations in cells:
           Adam        Beth    Charlie
Adam    1       1       0
Beth           .        1       1
Charlie .       .       1

Question 1: How would I do this in Stata? I looked at -help mata-, but
I don’t even know if that’s the right direction. Is it? If not, how
might I do it? This option seems more difficult for me (given my
familiarity with Stata’s functionality) than the “long option” below.

Here’s the long option, which seems more feasible for me, given my
level of skill. Note that each combination is listed just once, again
with maximum values:
Adam    Adam    1
Adam    Beth            1
Adam    Charlie 0
Beth            Beth            1
Beth            Charlie 1
Charlie Charlie 1

Question 2: I can get the data to long format fine no problem. But end
up with duplicates of combinations, as Adam is asked about Ben, and
Ben is asked about Adam (i.e. a total of 9 observations, rather than
the six above). How could I drop duplicate combinations, saving only
the max value for each? While I am pretty familiar with the
-duplicates- set of commands, I’m running into the problem that I
don’t know how to use the command since combinations go both ways,
where Adam-Beth is a duplicate of Beth-Adam. I’ve also thought about
it substituting numbers for people (i.e. 1-2 & 2-1), but that doesn’t
change my problem that I can’t figure out how to tell Stata to treat
those as duplicates.

Thanks for any help.

