Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Social Network Analysis shortest path centrality

From	Michael Goodwin <[email protected]>
To	[email protected]
Subject	Re: st: Social Network Analysis shortest path centrality
Date	Wed, 4 Sep 2013 12:57:45 -0400
(prematurely sent)

I'm trying an approach to loop over observations within a given co and
to make blank any values that match any previous connection values.
For some reason, this block of code beginning at forv totalCompanies =
1/$maxSource { doesn't seem to work, despite the fact that when I
manually walk through the key line of the code (replace level`i' = .
if level`i'==level`count'[`totalConnections'] & level`i'!=.), the line
does what I want, replacing the furthest level with a . if the
connection has already been made by a previous level. If anyone is
able to point out why this isn't working, or a different approach,
that would be a big help.

On Wed, Sep 4, 2013 at 12:54 PM, Michael Goodwin
<[email protected]> wrote:
> I'm trying an approach to loop over observations within a given co and
> to make blank any values that match any previous connection values.
> For some reason, this block of code beginning at forv totalCompanies =
> 1/$maxSource { doesn't seem to work, despite the fact that when I
>
>
> clear
> input source target
> 1 2
> 1 5
> 1 6
> 1 9
> 2 3
> 2 5
> 2 7
> 3 8
> 3 6
> 4 9
> 4 1
> end
>
>
> * Create macros with total number of companies
> preserve
> duplicates drop source, force
> global maxSource = _N
> restore
> tempfile main
> save "`main'"
>
> * Check for duplicates
> sort source target
> isid source target
>
> * Create dataset containing all connections
> rename source co
> gen level0 = co
> local i 0
> local more 1
> local isdone co
> while `more' {
> local ++i
> rename target source
> joinby source using `main', unmatched(master)
> drop _merge
> rename source level`i'
> replace target = . if inlist(target,`isdone')
> local isdone `isdone', level`i'
> forv totalCompanies = 1/$maxSource {
> preserve
> keep if co==`totalCompanies'
> local maxConnections = _N
> forv num = 1/`i' {
> local count = `i'-`num'
> forv totalConnections = 1/`maxConnections' {
> set trace on
> replace level`i' = . if level`i'==level`count'[`totalConnections'] & level`i'!=.
> set trace off
> }
> }
> restore
> }
> sort co level`i'
> by co level`i': gen one = _n == 1 & !mi(level`i')
> by co: egen connect`i' = total(one)
> drop one
> count if !mi(target)
> local more = r(N)
> qui compress
> }
>
> drop target
> sort co level*
> order co level* connect*
>
> On Wed, Sep 4, 2013 at 11:10 AM, Michael Goodwin
> <[email protected]> wrote:
>> This is very useful and more efficient than what I had put together.
>> The final piece of the puzzle is identifying which observations
>> contain the shortest path between the "co" and a given node. In the
>> example above, in the third observation, co 1 is connected to 9 in
>> level4. However, in the tenth observation, co 1 is connected to 9 in
>> level1. Is there an efficient way to replace level with a missing
>> value if that combination of co and level already exists and is a
>> shorter path? I'm working on this final piece now, but would love to
>> hear any thoughts. Thanks in advance for your help.
>>
>> On Wed, Sep 4, 2013 at 8:54 AM, Robert Picard <[email protected]> wrote:
>>> You can stop a list of connections from looping back onto
>>> itself by replacing target with a missing value if it
>>> identifies a company that is already part of the list for
>>> that observation. This can easily be done using -inlist()-.
>>> Since -inlist()- does not handle a regular varlist, you can
>>> use a local macro to build-up the comma separated variable
>>> list as you go.
>>>
>>> Note that it really helps to create a toy dataset to sort
>>> out these problems. Your proposed code does not stop looping
>>> when used with the test data I'm using.
>>>
>>> * --------------------- begin example ---------------------
>>> clear
>>> input source target
>>> 1 2
>>> 1 5
>>> 1 6
>>> 1 9
>>> 2 3
>>> 2 5
>>> 2 7
>>> 5 8
>>> 5 6
>>> 8 9
>>> 8 1
>>> end
>>> tempfile main
>>> save "`main'"
>>> * make sure the data has no duplicates
>>> isid source target
>>>
>>> rename source co
>>> local i 0
>>> local more 1
>>> local isdone co
>>> while `more' {
>>> local ++i
>>> rename target source
>>> joinby source using "`main'", unmatched(master)
>>> drop _merge
>>> rename source level`i'
>>> replace target = . if inlist(target,`isdone')
>>> local isdone `isdone', level`i'
>>> sort co level`i'
>>> by co level`i': gen one = _n == 1 & !mi(level`i')
>>> by co: egen connect`i' = total(one)
>>> drop one
>>> count if !mi(target)
>>> local more = r(N)
>>> }
>>> drop target
>>> sort co level*
>>> order co level* connect*
>>> list, sepby(co) noobs
>>> * --------------------- end example -----------------------
>>>
>>> On Tue, Sep 3, 2013 at 6:15 PM, Michael Goodwin
>>> <[email protected]> wrote:
>>>> I think I've figured out the issue. Using this approach, when dropping
>>>> values that are equal, you need to ensure that you aren't dropping
>>>> null values. Below is the code to drop only matching non-null values.
>>>> Following the loop are the final few lines to calculate centrality
>>>> using the approach I mentioned in my initial post. Thanks for your
>>>> help!
>>>>
>>>> * Create dataset containing all connections;
>>>> rename Source company;
>>>> local i 0;
>>>> local more 1;
>>>> while `more' {;
>>>> local ++i;
>>>> rename Target Source;
>>>> joinby Source using `main', unmatched(master);
>>>> drop _merge;
>>>> rename Source level`i';
>>>> sort company level`i';
>>>> forv num = 1/`i' {;
>>>> local count = `i'-`num';
>>>> cap drop if level`i'==level`count' & level`i'!="";
>>>> cap drop if company==level`count';
>>>> };
>>>> by company level`i': gen one = _n == 1 & !mi(level`i');
>>>> by company: egen connect`i' = total(one);
>>>> drop one;
>>>> count if !mi(Target);
>>>> local more = r(N);
>>>> qui compress;
>>>> };
>>>>
>>>> * Centrality;
>>>> egen totalPath = rownonmiss(connect*), strok;
>>>> local maxPath = totalPath;
>>>> forv num = 1/`maxPath' {;
>>>> gen connectCentrality`num' = connect`num'/`num';
>>>> };
>>>> egen centrality = rowtotal(connectCentrality*);
>>>>
>>>> On Tue, Sep 3, 2013 at 5:10 PM, Michael Goodwin
>>>> <[email protected]> wrote:
>>>>> Robert, this is very helpful and actually not so far from what I had
>>>>> originally coded out.
>>>>>
>>>>> The only issue I am encountering now is that some of the networks
>>>>> actually double back on themselves, so that the loop you've written
>>>>> out continues on infinitely (or would if Stata didn't have observation
>>>>> limits).
>>>>>
>>>>> I think the solution is to write code that drops any observations in
>>>>> which the level`i' variable is equal to any of the preceding
>>>>> level[`i'-1], level[`i'-2], etc. variables. Do you have any thoughts
>>>>> on how to best accomplish that? My thinking was to create a loop that
>>>>> compares the current level with each previous level and drops the
>>>>> observation if any two values match. I have to use capture because
>>>>> there is no level0. This doesn't seem to be working using my dataset.
>>>>>
>>>>>
>>>>> * --------------------- begin example ---------------------
>>>>> clear
>>>>> input Source Target
>>>>> 1 2
>>>>> 1 5
>>>>> 1 6
>>>>> 1 9
>>>>> 2 3
>>>>> 2 5
>>>>> 2 7
>>>>> 5 8
>>>>> 5 6
>>>>> 8 9
>>>>> end
>>>>> tempfile main
>>>>> save "`main'"
>>>>> * make sure the data has no duplicates
>>>>> isid Source Target
>>>>>
>>>>> rename Source co;
>>>>> local i 0;
>>>>> local more 1;
>>>>> while `more' {;
>>>>> local ++i;
>>>>> rename Target Source;
>>>>> joinby Source using `main', unmatched(master);
>>>>> drop _merge;
>>>>> rename Source level`i';
>>>>> sort co level`i';
>>>>> forv num = 1/`i' {;
>>>>> local count = `i'-`num';
>>>>> cap drop if level`i'==level`count';
>>>>> cap drop if co==level`count';
>>>>> };
>>>>> by co level`i': gen one = _n == 1 & !mi(level`i');
>>>>> by co: egen connect`i' = total(one);
>>>>> drop one;
>>>>> count if !mi(Target);
>>>>> local more = r(N);
>>>>> qui compress;
>>>>> };
>>>>> drop Target;
>>>>> sort co level*;
>>>>> order co level* connect*;
>>>>> list, sepby(co) noobs;
>>>>> * --------------------- end example -----------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Sep 1, 2013 at 11:44 AM, Robert Picard <[email protected]> wrote:
>>>>>> There is indeed a problem in merging the list with itself
>>>>>> as it leads to many-to-many merges and I have yet to see
>>>>>> one case where an m:m merge is useful. You can however use
>>>>>> -joinby- to perform what you intuitively would want -merge-
>>>>>> to do in this case.
>>>>>>
>>>>>> * --------------------- begin example ---------------------
>>>>>> clear
>>>>>> input source target
>>>>>> 1 2
>>>>>> 1 5
>>>>>> 1 6
>>>>>> 1 9
>>>>>> 2 3
>>>>>> 2 5
>>>>>> 2 7
>>>>>> 5 8
>>>>>> 5 6
>>>>>> 8 9
>>>>>> end
>>>>>> tempfile main
>>>>>> save "`main'"
>>>>>> * make sure the data has no duplicates
>>>>>> isid source target
>>>>>>
>>>>>> rename source co
>>>>>> local i 0
>>>>>> local more 1
>>>>>> while `more' {
>>>>>> local ++i
>>>>>> rename target source
>>>>>> joinby source using "`main'", unmatched(master)
>>>>>> drop _merge
>>>>>> rename source level`i'
>>>>>> sort co level`i'
>>>>>> by co level`i': gen one = _n == 1 & !mi(level`i')
>>>>>> by co: egen connect`i' = total(one)
>>>>>> drop one
>>>>>> count if !mi(target)
>>>>>> local more = r(N)
>>>>>> }
>>>>>> drop target
>>>>>> sort co level*
>>>>>> order co level* connect*
>>>>>> list, sepby(co) noobs
>>>>>> * --------------------- end example -----------------------
>>>>>>
>>>>>> Original message follows:
>>>>>>
>>>>>> st: Social Network Analysis shortest path centrality
>>>>>>
>>>>>> From  Michael Goodwin <[email protected]>
>>>>>> To  [email protected]
>>>>>> Subject  st: Social Network Analysis shortest path centrality
>>>>>> Date  Fri, 30 Aug 2013 13:53:16 -0400
>>>>>>  Hi,
>>>>>>
>>>>>> I am trying to do some light social network analysis on a dataset
>>>>>> containing a list of edges. I have the dataset organized such that
>>>>>> there are two variables, Source and Target. Bot the Source and Target
>>>>>> are companies, and the connection between indicates that an employee
>>>>>> from Source went on to found Target. The relationship between these
>>>>>> two variables is indeterminate (i.e. m:m) and although the variables
>>>>>> start as strings, I've converted them to numeric values using encode
>>>>>> (and ensured
>>>>>> that the numeric values in both Target and Source are equal to one another).
>>>>>>
>>>>>> I am attempting to determine the number of first, second, third,...,n
>>>>>> degree connections that each Source has. For example if an employees
>>>>>> from Company A went on to found Company B and then employees from
>>>>>> Company B went on to found Companies C and D, Company A would have 1
>>>>>> first degree connection and 2 second degree connections.
>>>>>>
>>>>>> My goal is to create something similar to a shortest path measurement
>>>>>> whereby a first degree connection is equal to 1, a second degree
>>>>>> connection 1/2, a third degree connection 1/3, and so forth. In the
>>>>>> above example, Company A's score would be (1/1)+(2/2) or 2. I believe
>>>>>> this is a closeness/shortest path centrality approach, but I may be
>>>>>> mistaken (and would love to be corrected!).
>>>>>>
>>>>>> After making the connections symmetric (i.e. all pairs are present as
>>>>>> both inbound and outbound connections), I've attempted three
>>>>>> approaches, all without success:
>>>>>>
>>>>>> 1. Use netsis and netsummarize. Neither the adjacency nor closeness
>>>>>> calculations seems to get me to the right answer. I don't have
>>>>>> experience using mata, but it appears that the matrix generate by
>>>>>> netsis doesn't reflect the appropriate connections (i.e. a connection
>>>>>> in the original edge list is not represented by a 1 in the matrix)
>>>>>>
>>>>>> netsis Source Target, measure(adjacency) name(A, replace);
>>>>>> netsummarize A/(rows(A)-1), generate(degree) statistic(rowsum);
>>>>>>
>>>>>> netsis Source Target, measure(distance) name(D, replace)
>>>>>> netsummarize (rows(D)-1):/rowsum(D), generate(closeness) statistic(rowsum)
>>>>>>
>>>>>> 2. Create a matrix data structure in Stata and use centpow. I keep
>>>>>> receiving an error noting that the matrix is not symmetrical. I've
>>>>>> checked and made sure that the dataset is a perfect square (it has 707
>>>>>> observations and 707 variables) and that a connection between Company
>>>>>> A and Company B is also represented by a connection between Company B
>>>>>> and Company A. Does centpow require the data to actually be in a mata
>>>>>> matrix?
>>>>>>
>>>>>> use ".\dta\\${connection}_connectionIDSymmetric${typeInt}.dta";
>>>>>> contract targetID sourceID;
>>>>>> reshape wide _freq , i(targetID) j(sourceID);
>>>>>> qui foreach v of var _freq* {;
>>>>>> replace `v' = 0 if mi(`v');
>>>>>> };
>>>>>> drop targetID;
>>>>>> save ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta", replace;
>>>>>> centpow ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta";
>>>>>>
>>>>>> 3. Start with the edgelist, and merge it with itself, changing the
>>>>>> Target and Source variable names such that Target becomes Source for
>>>>>> the second degree connection and so forth (I think this is
>>>>>> demonstrably not the solution, so I won't elaborate further).
>>>>>>
>>>>>> I think this either has a simple solution that I can't think of
>>>>>> involving the edge list, or will involve a more intensive solution
>>>>>> using mata. If anyone has experience or could point me in the
>>>>>> direction of content (Statalist has limited SNA resources), that would
>>>>>> be a huge help.
>>>>>>
>>>>>> Here are some of the resources I've already reviewed:
>>>>>> http://www.rensecorten.org/index.php/research/social-network-analysis-with-stata/
>>>>>> https://sites.google.com/site/statagraphlibrary/netgen111
>>>>>> http://www.ats.ucla.edu/stat/sna/sna_stata.htm
>>>>>>
>>>>>> Thanks in advance.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Mike
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mike Goodwin
>>>>> Senior Associate, Endeavor Insight
>>>>> 900 Broadway | Suite 301 | New York, NY 10003
>>>>> +1 646 368 6354 | Skype: michael.p.goodwin
>>>>
>>>>
>>>>
>>>> --
>>>> Mike Goodwin
>>>> Senior Associate, Endeavor Insight
>>>> 900 Broadway | Suite 301 | New York, NY 10003
>>>> +1 646 368 6354 | Skype: michael.p.goodwin
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> --
>> Mike Goodwin
>> Senior Associate, Endeavor Insight
>> 900 Broadway | Suite 301 | New York, NY 10003
>> +1 646 368 6354 | Skype: michael.p.goodwin
>
>
>
> --
> Mike Goodwin
> Senior Associate, Endeavor Insight
> 900 Broadway | Suite 301 | New York, NY 10003
> +1 646 368 6354 | Skype: michael.p.goodwin



-- 
Mike Goodwin
Senior Associate, Endeavor Insight
900 Broadway | Suite 301 | New York, NY 10003
+1 646 368 6354 | Skype: michael.p.goodwin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: Social Network Analysis shortest path centrality
  - From: Robert Picard <[email protected]>
References:
- RE: st: Social Network Analysis shortest path centrality
  - From: Robert Picard <[email protected]>
- Re: st: Social Network Analysis shortest path centrality
  - From: Michael Goodwin <[email protected]>
- Re: st: Social Network Analysis shortest path centrality
  - From: Michael Goodwin <[email protected]>
- Re: st: Social Network Analysis shortest path centrality
  - From: Robert Picard <[email protected]>
- Re: st: Social Network Analysis shortest path centrality
  - From: Michael Goodwin <[email protected]>
- Re: st: Social Network Analysis shortest path centrality
  - From: Michael Goodwin <[email protected]>
Prev by Date: Re: st: Social Network Analysis shortest path centrality
Next by Date: Re: twoway bar [was: Re: st: Features for Stata 14]
Previous by thread: Re: st: Social Network Analysis shortest path centrality
Next by thread: Re: st: Social Network Analysis shortest path centrality
Index(es):
- Date
- Thread