Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) Date Tue, 16 Oct 2012 12:44:11 +0100

What you asked for, as I understood it, was the total number of distinct actors

1. that meet a specified condition (condition == 0)

and

2. with which any actor has shared one or more projects in the same year.

(I am ignoring the word "focal", which you haven't defined and I don't
understand.)

If you want something else, please give the definition. It's not
enough (for me) to say that the code gives the wrong answer in some
cases. Note that my code gives the same answer for each actor, as it
is a total over all (project, year) possibilities. If you want a
different count for each (project, year)  you'll need to modify the
code accordingly.

Nick

On Tue, Oct 16, 2012 at 12:25 PM, Erik Aadland <erikaadland@hotmail.com> wrote:
> The code works. Thank you Nick.
> However, I am experiencing a few problems that I suspect stem from more detailed differences in my data structure. Detailed differences that depart from the structure I previously specified in this post.
> In particular, I might have some projects in which only one actor is present.
> Showing by example is perhaps easiest. Here is the result from Nick's code (based on my previously supplied data structure) on a slightly expanded dataset:
> year    project_id    actor_id    condition    proj    act    mywanted
> 2000    1             1           1            1 2000  1      2
> 2000    2             1           1            2 2000  1      2
> 2000    1             2           0            1 2000  2      1
> 2000    2             2           0            2 2000  2      1
> 2000    1             3           0            1 2000  3      1
> 2000    2             3           0            2 2000  3      1
> 2000    3             4           1            3 2000  4      4
> 2000    3             5           0            3 2000  5      2
> 2000    3             6           0            3 2000  6      2
> 2000    4             7           0            4 2000  7      2
> 2001    5             1           1            5 2001  1      2
> 2001    5             2           0            5 2001  2      1
> 2001    6             2           0            6 2001  2      1
> 2001    5             3           0            5 2001  3      1
> 2001    6             3           0            6 2001  3      1
> 2001    5             4           1            5 2001  4      4
> 2001    6             4           1            6 2001  4      4
> 2001    7             5           0            7 2001  5      2
> 2001    7             6           0            7 2001  6      2
> 2001    7             7           0            7 2001  7      2
> 2001    8             8           0            8 2001  8      0
>
> In this result (focusing on year==2000 only now), mywanted scores for actor_id==7 in project_id==4 is incorrect (correct mywanted==0). The mywanted scores for actor_ids in project_id==3 are also incorrect.
>
> In year==2001, the mywanted score==0 for actor_id==8 in project_id==8 is on the other hand correct.
> How get around this? I am sorry that I did not include these structural details in my initial post.
> Sincerely,
> Erik.
>
> ----------------------------------------
>> Date: Tue, 16 Oct 2012 11:49:22 +0100
>> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> From: njcoxstata@gmail.com
>> To: statalist@hsphsun2.harvard.edu
>>
>> I suspect that you didn't copy all the code. The last line of code
>> has a brace (curly bracket }) by itself.
>>
>> On Tue, Oct 16, 2012 at 11:44 AM, Erik Aadland <erikaadland@hotmail.com> wrote:
>> > Thank you Nick!
>> > You are quite right. I was imprecise; it is distinct actors I want to capture.
>> > When I run your suggested code, I get this error message after the following line of code:
>> >
>> > qui forval a = 1/`nact' {
>> > unexpected end of file
>> > r(612);
>> >
>> > What could possibly cause this error message? I am using Stata 10.
>> > Thanks again and kind regards,
>> > Erik.
>> >
>> >
>> > ----------------------------------------
>> >> Date: Tue, 16 Oct 2012 11:19:41 +0100
>> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> >> From: njcoxstata@gmail.com
>> >> To: statalist@hsphsun2.harvard.edu
>> >>
>> >> First off, on my list of hobby-horses is a prejudice that the word
>> >> "unique" is misused here, although you are in very good company:
>> >> StataCorp itself does it in various places, e.g. -codebook-, although
>> >> I am working on changing their habits if I can. The word "unique"
>> >> strictly means occurring once only; I recommend the word "distinct"
>> >> for what you want. There is a longer discussion of terminology in
>> >>
>> >> SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
>> >> (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
>> >> Q4/08 SJ 8(4):557--568
>> >> from first principles; provides a convenience command
>> >>
>> >> That said, when faced with a problem like yours, vague ideas of
>> >> possible solutions rise up. Is this a case for associative arrays as
>> >> implemented in Mata? Is there a cunning restructuring of the data from
>> >> which the answer would fall out easily? Precise inspiration was
>> >> lacking and what seemed crucial was that you need to consider each
>> >> actor in each combination of project and year. That pointed out to
>> >> loops over actors _and_ over project-years. Once that idea was taken
>> >> up, life is usually easier if all identifiers run over the integers
>> >> from 1 up. Also, the flavour of compiling a list and eventually
>> >> counting distinct members of other actors suggested -levelsof- and the
>> >> list manipulation tools documented at -help macrolists-.
>> >>
>> >> So, here is my code. Absolutely nothing rules out other kinds of solutions.
>> >>
>> >> input year project_id actor_id condition wanted
>> >> 2000 1 1 1 2
>> >> 2000 1 2 0 1
>> >> 2000 1 3 0 1
>> >> 2000 1 7 1 2
>> >> 2000 2 1 1 2
>> >> 2000 2 2 0 1
>> >> 2000 2 3 0 1
>> >> 2000 3 4 1 2
>> >> 2000 3 5 0 1
>> >> 2000 3 6 0 1
>> >> 2000 3 . . .
>> >> 2001 4 1 1 2
>> >> 2001 4 2 0 1
>> >> 2001 4 3 0 1
>> >> end
>> >>
>> >> * identifiers guaranteed to run 1 up if the real ones don't!
>> >> * note that "same project, same year" defines a group
>> >> egen proj = group(project_id year), label
>> >> su proj, meanonly
>> >> local nproj = r(max)
>> >>
>> >> egen act = group(actor_id), label
>> >> su act, meanonly
>> >> local nact = r(max)
>> >>
>> >> gen mywanted = .
>> >>
>> >> * lists of those in each project and year and condition == 0
>> >> qui forval p = 1/`nproj' {
>> >> levelsof act if proj == `p' & condition == 0, local(who`p')
>> >> }
>> >>
>> >> macro list
>> >>
>> >> * now cycle over actors
>> >> qui forval a = 1/`nact' {
>> >>
>> >> * blank out workspace
>> >> local work
>> >>
>> >> * if actor was included, we want to add that list to workspace
>> >> * in practice -if r(N)- will be true if and only if -r(N)- is positive
>> >> forval p = 1/`nproj' {
>> >> count if act == `a' & proj == `p'
>> >> if r(N) local work `work' `who`p''
>> >> }
>> >>
>> >> * remove duplicates
>> >> local work : list uniq work
>> >> * remove this actor
>> >> local work : list work - a
>> >> * see what we got for debugging
>> >> noi di "`a' `work'"
>> >>
>> >> replace mywanted = `: list sizeof work' if act == `a'
>> >> }
>> >>
>> >> Nick
>> >>
>> >>
>> >> On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <erikaadland@hotmail.com> wrote:
>> >>
>> >> > I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects.
>> >> > This is my data structure:
>> >> > year project_id actor_id condition wanted
>> >> > 2000 1 1 1 2
>> >> > 2000 1 2 0 1
>> >> > 2000 1 3 0 1
>> >> > 2000 1 7 1 2
>> >> > 2000 2 1 1 2
>> >> > 2000 2 2 0 1
>> >> > 2000 2 3 0 1
>> >> > 2000 3 4 1 2
>> >> > 2000 3 5 0 1
>> >> > 2000 3 6 0 1
>> >> > 2000 3 . . .
>> >> > 2001 4 1 1 2
>> >> > 2001 4 2 0 1
>> >> > 2001 4 3 0 1
>> >> > .....and so on
>> >> > So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000.
>> >> > My attempted code (which is quite wrong):
>> >> > sort actor_id year projects ;
>> >> > by actor_id year: gen nvals = _n == 1 ;
>> >> > sort actor_id year project_id ;
>> >> > egen wanted = total(nvals & condition == 0), by(agency_id year) ;
>> >> > replace wanted = wanted - (nvals & condition == 0) ;
>> >>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/