Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Social Network Analysis shortest path centrality


From   Michael Goodwin <[email protected]>
To   [email protected]
Subject   Re: st: Social Network Analysis shortest path centrality
Date   Tue, 3 Sep 2013 18:15:48 -0400

I think I've figured out the issue. Using this approach, when dropping
values that are equal, you need to ensure that you aren't dropping
null values. Below is the code to drop only matching non-null values.
Following the loop are the final few lines to calculate centrality
using the approach I mentioned in my initial post. Thanks for your
help!

* Create dataset containing all connections;
rename Source company;
local i 0;
local more 1;
while `more' {;
local ++i;
rename Target Source;
joinby Source using `main', unmatched(master);
drop _merge;
rename Source level`i';
sort company level`i';
forv num = 1/`i' {;
local count = `i'-`num';
cap drop if level`i'==level`count' & level`i'!="";
cap drop if company==level`count';
};
by company level`i': gen one = _n == 1 & !mi(level`i');
by company: egen connect`i' = total(one);
drop one;
count if !mi(Target);
local more = r(N);
qui compress;
};

* Centrality;
egen totalPath = rownonmiss(connect*), strok;
local maxPath = totalPath;
forv num = 1/`maxPath' {;
gen connectCentrality`num' = connect`num'/`num';
};
egen centrality = rowtotal(connectCentrality*);

On Tue, Sep 3, 2013 at 5:10 PM, Michael Goodwin
<[email protected]> wrote:
> Robert, this is very helpful and actually not so far from what I had
> originally coded out.
>
> The only issue I am encountering now is that some of the networks
> actually double back on themselves, so that the loop you've written
> out continues on infinitely (or would if Stata didn't have observation
> limits).
>
> I think the solution is to write code that drops any observations in
> which the level`i' variable is equal to any of the preceding
> level[`i'-1], level[`i'-2], etc. variables. Do you have any thoughts
> on how to best accomplish that? My thinking was to create a loop that
> compares the current level with each previous level and drops the
> observation if any two values match. I have to use capture because
> there is no level0. This doesn't seem to be working using my dataset.
>
>
> * --------------------- begin example ---------------------
> clear
> input Source Target
> 1 2
> 1 5
> 1 6
> 1 9
> 2 3
> 2 5
> 2 7
> 5 8
> 5 6
> 8 9
> end
> tempfile main
> save "`main'"
> * make sure the data has no duplicates
> isid Source Target
>
> rename Source co;
> local i 0;
> local more 1;
> while `more' {;
> local ++i;
> rename Target Source;
> joinby Source using `main', unmatched(master);
> drop _merge;
> rename Source level`i';
> sort co level`i';
> forv num = 1/`i' {;
> local count = `i'-`num';
> cap drop if level`i'==level`count';
> cap drop if co==level`count';
> };
> by co level`i': gen one = _n == 1 & !mi(level`i');
> by co: egen connect`i' = total(one);
> drop one;
> count if !mi(Target);
> local more = r(N);
> qui compress;
> };
> drop Target;
> sort co level*;
> order co level* connect*;
> list, sepby(co) noobs;
> * --------------------- end example -----------------------
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Sep 1, 2013 at 11:44 AM, Robert Picard <[email protected]> wrote:
>> There is indeed a problem in merging the list with itself
>> as it leads to many-to-many merges and I have yet to see
>> one case where an m:m merge is useful. You can however use
>> -joinby- to perform what you intuitively would want -merge-
>> to do in this case.
>>
>> * --------------------- begin example ---------------------
>> clear
>> input source target
>> 1 2
>> 1 5
>> 1 6
>> 1 9
>> 2 3
>> 2 5
>> 2 7
>> 5 8
>> 5 6
>> 8 9
>> end
>> tempfile main
>> save "`main'"
>> * make sure the data has no duplicates
>> isid source target
>>
>> rename source co
>> local i 0
>> local more 1
>> while `more' {
>> local ++i
>> rename target source
>> joinby source using "`main'", unmatched(master)
>> drop _merge
>> rename source level`i'
>> sort co level`i'
>> by co level`i': gen one = _n == 1 & !mi(level`i')
>> by co: egen connect`i' = total(one)
>> drop one
>> count if !mi(target)
>> local more = r(N)
>> }
>> drop target
>> sort co level*
>> order co level* connect*
>> list, sepby(co) noobs
>> * --------------------- end example -----------------------
>>
>> Original message follows:
>>
>> st: Social Network Analysis shortest path centrality
>>
>> From  Michael Goodwin <[email protected]>
>> To  [email protected]
>> Subject  st: Social Network Analysis shortest path centrality
>> Date  Fri, 30 Aug 2013 13:53:16 -0400
>>  Hi,
>>
>> I am trying to do some light social network analysis on a dataset
>> containing a list of edges. I have the dataset organized such that
>> there are two variables, Source and Target. Bot the Source and Target
>> are companies, and the connection between indicates that an employee
>> from Source went on to found Target. The relationship between these
>> two variables is indeterminate (i.e. m:m) and although the variables
>> start as strings, I've converted them to numeric values using encode
>> (and ensured
>> that the numeric values in both Target and Source are equal to one another).
>>
>> I am attempting to determine the number of first, second, third,...,n
>> degree connections that each Source has. For example if an employees
>> from Company A went on to found Company B and then employees from
>> Company B went on to found Companies C and D, Company A would have 1
>> first degree connection and 2 second degree connections.
>>
>> My goal is to create something similar to a shortest path measurement
>> whereby a first degree connection is equal to 1, a second degree
>> connection 1/2, a third degree connection 1/3, and so forth. In the
>> above example, Company A's score would be (1/1)+(2/2) or 2. I believe
>> this is a closeness/shortest path centrality approach, but I may be
>> mistaken (and would love to be corrected!).
>>
>> After making the connections symmetric (i.e. all pairs are present as
>> both inbound and outbound connections), I've attempted three
>> approaches, all without success:
>>
>> 1. Use netsis and netsummarize. Neither the adjacency nor closeness
>> calculations seems to get me to the right answer. I don't have
>> experience using mata, but it appears that the matrix generate by
>> netsis doesn't reflect the appropriate connections (i.e. a connection
>> in the original edge list is not represented by a 1 in the matrix)
>>
>> netsis Source Target, measure(adjacency) name(A, replace);
>> netsummarize A/(rows(A)-1), generate(degree) statistic(rowsum);
>>
>> netsis Source Target, measure(distance) name(D, replace)
>> netsummarize (rows(D)-1):/rowsum(D), generate(closeness) statistic(rowsum)
>>
>> 2. Create a matrix data structure in Stata and use centpow. I keep
>> receiving an error noting that the matrix is not symmetrical. I've
>> checked and made sure that the dataset is a perfect square (it has 707
>> observations and 707 variables) and that a connection between Company
>> A and Company B is also represented by a connection between Company B
>> and Company A. Does centpow require the data to actually be in a mata
>> matrix?
>>
>> use ".\dta\\${connection}_connectionIDSymmetric${typeInt}.dta";
>> contract targetID sourceID;
>> reshape wide _freq , i(targetID) j(sourceID);
>> qui foreach v of var _freq* {;
>> replace `v' = 0 if mi(`v');
>> };
>> drop targetID;
>> save ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta", replace;
>> centpow ".\dta\\${connection}_adjacencyMatrix${typeInt}.dta";
>>
>> 3. Start with the edgelist, and merge it with itself, changing the
>> Target and Source variable names such that Target becomes Source for
>> the second degree connection and so forth (I think this is
>> demonstrably not the solution, so I won't elaborate further).
>>
>> I think this either has a simple solution that I can't think of
>> involving the edge list, or will involve a more intensive solution
>> using mata. If anyone has experience or could point me in the
>> direction of content (Statalist has limited SNA resources), that would
>> be a huge help.
>>
>> Here are some of the resources I've already reviewed:
>> http://www.rensecorten.org/index.php/research/social-network-analysis-with-stata/
>> https://sites.google.com/site/statagraphlibrary/netgen111
>> http://www.ats.ucla.edu/stat/sna/sna_stata.htm
>>
>> Thanks in advance.
>>
>> Best,
>>
>> Mike
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Mike Goodwin
> Senior Associate, Endeavor Insight
> 900 Broadway | Suite 301 | New York, NY 10003
> +1 646 368 6354 | Skype: michael.p.goodwin



-- 
Mike Goodwin
Senior Associate, Endeavor Insight
900 Broadway | Suite 301 | New York, NY 10003
+1 646 368 6354 | Skype: michael.p.goodwin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index