Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Date   Wed, 20 Feb 2013 11:03:43 +0000

I added some commentary below.

On Wed, Feb 20, 2013 at 8:20 AM, Erik Aadland <erikaadland@hotmail.com> wrote:
> Thank you so much, Nick.
> The code appears to work perfectly.
> I will compare this code to the previous code for the related measure and do my best to absorb what is going on.
> Kind regards,
> Erik.
>
> ----------------------------------------
>> Date: Tue, 19 Feb 2013 19:34:59 +0000
>> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> From: njcoxstata@gmail.com
>> To: statalist@hsphsun2.harvard.edu
>>
>> I don't know about smart, but this seems the same kind of problem.
>> Change the order of the loops and let the list of colleagues
>> accumulate from year to year for each actorr.
>>
>> Have a look at this code

Comment #1.

The first step is just to create a small toy or sandpit dataset for
which results are easy to derive. Erik provided this dataset himself
and it's always a good idea. Naturally, the full dataset might expose
problems not in the toy dataset, but one problem at a time....

>> clear
>> input year project_id actor_id condition
>> 2000 1 1 1
>> 2000 2 1 1
>> 2000 1 2 0
>> 2000 2 2 0
>> 2000 1 3 0
>> 2000 2 3 0
>> 2000 3 4 1
>> 2000 3 5 0
>> 2000 3 6 0
>> 2000 4 7 0
>> 2001 5 1 1
>> 2001 5 2 0
>> 2001 6 2 0
>> 2001 5 3 0
>> 2001 6 3 0
>> 2001 5 4 1
>> 2001 6 4 1
>> 2001 7 5 0
>> 2001 7 6 0
>> 2001 7 7 0
>> 2001 8 8 0
>>
>> end

Comment #2.

I create variables using -egen-'s -group()- function that by
construction run 1 ... # of distinct values. Then I can pick up the
number of distinct values from -summarize, meanonly-. The maximum
group identifier is the number required. This is no more than
convenience, to make the loops to come very easy, but convenience
beats its opposite.

>> egen proj = group(project_id year), label
>> su proj, meanonly
>> local nproj = r(max)
>>
>> egen act = group(actor_id), label
>> su act, meanonly
>> local nact = r(max)
>>
>> egen yr = group(year), label
>> su yr, meanonly
>> local nyr = r(max)

Comment #3.

I initialise a counter variable. In essence we assume no co-actors,
unless we find some, in which case we will change the counter. Often
this command is inserted once you realise that the strategy is going
to be

Loop:
    Look at each possibility and work out the result.
    Put the result for that possibility in an existing variable.

The second implies a -replace-, but that in turn requires a previous
-generate- ahead of the loops.

>> gen mywanted2 = 0

Comment #4.

Now the slope gets steeper! The main difficulty of the problem is the
need to look in a group of other observations for co-actors. I went
for list manipulation. -levelsof- gives you overall lists and the rest
is looping over possibilities. Stuff discussed at -help macrolists- is
invaluable.

There are yet other possibilities, e.g. it is an open question whether
you would be better off with a different data structure. If the number
of actors on a project is small and their identifiers are of simple
form, then all the identifiers could be stored as values of a string
variable such as "1 3 5 8" and you could then treat the identifiers
using -word()- and -wordcount()-. A wild guess is that this makes some
things easier and some more difficult.

>> * lists of those in each project and year and condition == 0
>> qui forval p = 1/`nproj' {
>> levelsof act if proj == `p' & condition == 0, local(who`p')
>> }
>>
>> macro list
>>
>> * now cycle over actors
>> qui forval a = 1/`nact' {
>>
>> * blank out workspace
>> local work
>>
>> * cycle over years
>> qui forval y = 1/`nyr' {
>>
>> * if actor was included, we want to add that list to workspace
>> forval p = 1/`nproj' {
>> count if act == `a' & proj == `p' & yr == `y'
>> if r(N) local work `work' `who`p''
>> }
>>
>> * remove duplicates
>> local work : list uniq work
>> * remove this actor
>> local work : list work - a
>> * see what we got for debugging
>> noi di "`a' `work'"
>>
>> replace mywanted2 = `: list sizeof work' if act == `a' & yr == `y'
>> }
>> }

On Tue, Feb 19, 2013 at 4:35 PM, Erik Aadland <erikaadland@hotmail.com> wrote:

>> > A while back I got assistance from the list for making a separate count, for each actor_id and year, the number of distinct other actors that met a certain condition that the actor_id had occurred together with in projects.
>> > Nick Cox suggested the code below that worked wonderfully.
>> > This code generates a separate count for each actor_id and year.
>> > I now face a new challenge. I would like to generate a similar measure, that makes a cumulative count over each year (rather than for each year). So, if actor_id == 1 collaborated with 2 other distinct actors in 2000, the score for actor_id == 1 would be 2 in 2000. If actor_id == 1 collaborated with one additional distinct actor that met the condition in 2001, the score would increase to 3 in 2001 (if the disctinct actors already counted in the 2000 score were present in projects together with the actor_id in 2001 as well they would not be counted again in 2001).
>> > Is there a smart way to change the code below to generate this new measure?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index