Dear Statalist. A while back I got assistance from the list for making a separate count, for each actor_id and year, the number of distinct other actors that met a certain condition that the actor_id had occurred together with in projects. Nick Cox suggested the code below that worked wonderfully. This code generates a separate count for each actor_id and year. I now face a new challenge. I would like to generate a similar measure, that makes a cumulative count over each year (rather than for each year). So, if actor_id == 1 collaborated with 2 other distinct actors in 2000, the score for actor_id == 1 would be 2 in 2000. If actor_id == 1 collaborated with one additional distinct actor that met the condition in 2001, the score would increase to 3 in 2001 (if the disctinct actors already counted in the 2000 score were present in projects together with the actor_id in 2001 as well they would not be counted again in 2001). Is there a smart way to change the code below to generate this new measure? Sincerely and kind regards, Erik. ---------------------------------------- > Date: Tue, 16 Oct 2012 14:48:36 +0100 > Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) > From: njcoxstata@gmail.com > To: statalist@hsphsun2.harvard.edu > > It seems that you want a separate count for each year. > > If that's so, the code looks more like > > clear > > input year project_id actor_id condition > 2000 1 1 1 > 2000 2 1 1 > 2000 1 2 0 > 2000 2 2 0 > 2000 1 3 0 > 2000 2 3 0 > 2000 3 4 1 > 2000 3 5 0 > 2000 3 6 0 > 2000 4 7 0 > 2001 5 1 1 > 2001 5 2 0 > 2001 6 2 0 > 2001 5 3 0 > 2001 6 3 0 > 2001 5 4 1 > 2001 6 4 1 > 2001 7 5 0 > 2001 7 6 0 > 2001 7 7 0 > 2001 8 8 0 > > end > > egen proj = group(project_id year), label > su proj, meanonly > local nproj = r(max) > > egen act = group(actor_id), label > su act, meanonly > local nact = r(max) > > egen yr = group(year), label > su yr, meanonly > local nyr = r(max) > > gen mywanted = . > > * lists of those in each project and year and condition == 0 > qui forval p = 1/`nproj' { > levelsof act if proj == `p' & condition == 0, local(who`p') > } > > macro list > > * cycle over years > > qui forval y = 1/`nyr' { > > * now cycle over actors > qui forval a = 1/`nact' { > > * blank out workspace > local work > > * if actor was included, we want to add that list to workspace > forval p = 1/`nproj' { > count if act == `a' & proj == `p' & yr == `y' > if r(N) local work `work' `who`p'' > } > > * remove duplicates > local work : list uniq work > * remove this actor > local work : list work - a > * see what we got for debugging > noi di "`a' `work'" > > replace mywanted = `: list sizeof work' if act == `a' & yr == `y' > } > } > > > > > On Tue, Oct 16, 2012 at 1:00 PM, Erik Aadland <erikaadland@hotmail.com> wrote: > > This is correct. > > So, referring to the results in my previous post. > > In year==2000, actor_id == 4|5|6 occur only in project_id==3, and for actor_id== 5 and 6 condition==0. Actor_id==4 should have a mywanted score == 2, while actor_id==5 and 6 should each have a mywanted score == 1. Actor_id == 7 occurs only in project_id==4 this year and has shared projects with none other in this year (and therefore shares no project_id with any actor_id with condition==0) and should have a mywantedscore == 0. > > It puzzles me why the suggested code generates correct mywanted scores for the actor_ids in project_id==1 and 2, but not in project_id== 3 and 4. > > Kind regards, > > Erik. > > > > > > ---------------------------------------- > >> Date: Tue, 16 Oct 2012 12:44:11 +0100 > >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) > >> From: njcoxstata@gmail.com > >> To: statalist@hsphsun2.harvard.edu > >> > >> What you asked for, as I understood it, was the total number of distinct actors > >> > >> 1. that meet a specified condition (condition == 0) > >> > >> and > >> > >> 2. with which any actor has shared one or more projects in the same year. > >> > >> (I am ignoring the word "focal", which you haven't defined and I don't > >> understand.) > >> > >> If you want something else, please give the definition. It's not > >> enough (for me) to say that the code gives the wrong answer in some > >> cases. Note that my code gives the same answer for each actor, as it > >> is a total over all (project, year) possibilities. If you want a > >> different count for each (project, year) you'll need to modify the > >> code accordingly. > >> > >> Nick > >> > >> On Tue, Oct 16, 2012 at 12:25 PM, Erik Aadland <erikaadland@hotmail.com> wrote: > >> > The code works. Thank you Nick. > >> > However, I am experiencing a few problems that I suspect stem from more detailed differences in my data structure. Detailed differences that depart from the structure I previously specified in this post. > >> > In particular, I might have some projects in which only one actor is present. > >> > Showing by example is perhaps easiest. Here is the result from Nick's code (based on my previously supplied data structure) on a slightly expanded dataset: > >> > year project_id actor_id condition proj act mywanted > >> > 2000 1 1 1 1 2000 1 2 > >> > 2000 2 1 1 2 2000 1 2 > >> > 2000 1 2 0 1 2000 2 1 > >> > 2000 2 2 0 2 2000 2 1 > >> > 2000 1 3 0 1 2000 3 1 > >> > 2000 2 3 0 2 2000 3 1 > >> > 2000 3 4 1 3 2000 4 4 > >> > 2000 3 5 0 3 2000 5 2 > >> > 2000 3 6 0 3 2000 6 2 > >> > 2000 4 7 0 4 2000 7 2 > >> > 2001 5 1 1 5 2001 1 2 > >> > 2001 5 2 0 5 2001 2 1 > >> > 2001 6 2 0 6 2001 2 1 > >> > 2001 5 3 0 5 2001 3 1 > >> > 2001 6 3 0 6 2001 3 1 > >> > 2001 5 4 1 5 2001 4 4 > >> > 2001 6 4 1 6 2001 4 4 > >> > 2001 7 5 0 7 2001 5 2 > >> > 2001 7 6 0 7 2001 6 2 > >> > 2001 7 7 0 7 2001 7 2 > >> > 2001 8 8 0 8 2001 8 0 > >> > > >> > In this result (focusing on year==2000 only now), mywanted scores for actor_id==7 in project_id==4 is incorrect (correct mywanted==0). The mywanted scores for actor_ids in project_id==3 are also incorrect. > >> > > >> > In year==2001, the mywanted score==0 for actor_id==8 in project_id==8 is on the other hand correct. > >> > How get around this? I am sorry that I did not include these structural details in my initial post. > >> > Sincerely, > >> > Erik. > >> > > >> > ---------------------------------------- > >> >> Date: Tue, 16 Oct 2012 11:49:22 +0100 > >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) > >> >> From: njcoxstata@gmail.com > >> >> To: statalist@hsphsun2.harvard.edu > >> >> > >> >> I suspect that you didn't copy all the code. The last line of code > >> >> has a brace (curly bracket }) by itself. > >> >> > >> >> On Tue, Oct 16, 2012 at 11:44 AM, Erik Aadland <erikaadland@hotmail.com> wrote: > >> >> > Thank you Nick! > >> >> > You are quite right. I was imprecise; it is distinct actors I want to capture. > >> >> > When I run your suggested code, I get this error message after the following line of code: > >> >> > > >> >> > qui forval a = 1/`nact' { > >> >> > unexpected end of file > >> >> > r(612); > >> >> > > >> >> > What could possibly cause this error message? I am using Stata 10. > >> >> > Thanks again and kind regards, > >> >> > Erik. > >> >> > > >> >> > > >> >> > ---------------------------------------- > >> >> >> Date: Tue, 16 Oct 2012 11:19:41 +0100 > >> >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor) > >> >> >> From: njcoxstata@gmail.com > >> >> >> To: statalist@hsphsun2.harvard.edu > >> >> >> > >> >> >> First off, on my list of hobby-horses is a prejudice that the word > >> >> >> "unique" is misused here, although you are in very good company: > >> >> >> StataCorp itself does it in various places, e.g. -codebook-, although > >> >> >> I am working on changing their habits if I can. The word "unique" > >> >> >> strictly means occurring once only; I recommend the word "distinct" > >> >> >> for what you want. There is a longer discussion of terminology in > >> >> >> > >> >> >> SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations > >> >> >> (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton > >> >> >> Q4/08 SJ 8(4):557--568 > >> >> >> shows how to answer questions about distinct observations > >> >> >> from first principles; provides a convenience command > >> >> >> > >> >> >> That said, when faced with a problem like yours, vague ideas of > >> >> >> possible solutions rise up. Is this a case for associative arrays as > >> >> >> implemented in Mata? Is there a cunning restructuring of the data from > >> >> >> which the answer would fall out easily? Precise inspiration was > >> >> >> lacking and what seemed crucial was that you need to consider each > >> >> >> actor in each combination of project and year. That pointed out to > >> >> >> loops over actors _and_ over project-years. Once that idea was taken > >> >> >> up, life is usually easier if all identifiers run over the integers > >> >> >> from 1 up. Also, the flavour of compiling a list and eventually > >> >> >> counting distinct members of other actors suggested -levelsof- and the > >> >> >> list manipulation tools documented at -help macrolists-. > >> >> >> > >> >> >> So, here is my code. Absolutely nothing rules out other kinds of solutions. > >> >> >> > >> >> >> input year project_id actor_id condition wanted > >> >> >> 2000 1 1 1 2 > >> >> >> 2000 1 2 0 1 > >> >> >> 2000 1 3 0 1 > >> >> >> 2000 1 7 1 2 > >> >> >> 2000 2 1 1 2 > >> >> >> 2000 2 2 0 1 > >> >> >> 2000 2 3 0 1 > >> >> >> 2000 3 4 1 2 > >> >> >> 2000 3 5 0 1 > >> >> >> 2000 3 6 0 1 > >> >> >> 2000 3 . . . > >> >> >> 2001 4 1 1 2 > >> >> >> 2001 4 2 0 1 > >> >> >> 2001 4 3 0 1 > >> >> >> end > >> >> >> > >> >> >> * identifiers guaranteed to run 1 up if the real ones don't! > >> >> >> * note that "same project, same year" defines a group > >> >> >> egen proj = group(project_id year), label > >> >> >> su proj, meanonly > >> >> >> local nproj = r(max) > >> >> >> > >> >> >> egen act = group(actor_id), label > >> >> >> su act, meanonly > >> >> >> local nact = r(max) > >> >> >> > >> >> >> gen mywanted = . > >> >> >> > >> >> >> * lists of those in each project and year and condition == 0 > >> >> >> qui forval p = 1/`nproj' { > >> >> >> levelsof act if proj == `p' & condition == 0, local(who`p') > >> >> >> } > >> >> >> > >> >> >> macro list > >> >> >> > >> >> >> * now cycle over actors > >> >> >> qui forval a = 1/`nact' { > >> >> >> > >> >> >> * blank out workspace > >> >> >> local work > >> >> >> > >> >> >> * if actor was included, we want to add that list to workspace > >> >> >> * in practice -if r(N)- will be true if and only if -r(N)- is positive > >> >> >> forval p = 1/`nproj' { > >> >> >> count if act == `a' & proj == `p' > >> >> >> if r(N) local work `work' `who`p'' > >> >> >> } > >> >> >> > >> >> >> * remove duplicates > >> >> >> local work : list uniq work > >> >> >> * remove this actor > >> >> >> local work : list work - a > >> >> >> * see what we got for debugging > >> >> >> noi di "`a' `work'" > >> >> >> > >> >> >> replace mywanted = `: list sizeof work' if act == `a' > >> >> >> } > >> >> >> > >> >> >> Nick > >> >> >> > >> >> >> > >> >> >> On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <erikaadland@hotmail.com> wrote: > >> >> >> > >> >> >> > I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects. > >> >> >> > This is my data structure: > >> >> >> > year project_id actor_id condition wanted > >> >> >> > 2000 1 1 1 2 > >> >> >> > 2000 1 2 0 1 > >> >> >> > 2000 1 3 0 1 > >> >> >> > 2000 1 7 1 2 > >> >> >> > 2000 2 1 1 2 > >> >> >> > 2000 2 2 0 1 > >> >> >> > 2000 2 3 0 1 > >> >> >> > 2000 3 4 1 2 > >> >> >> > 2000 3 5 0 1 > >> >> >> > 2000 3 6 0 1 > >> >> >> > 2000 3 . . . > >> >> >> > 2001 4 1 1 2 > >> >> >> > 2001 4 2 0 1 > >> >> >> > 2001 4 3 0 1 > >> >> >> > .....and so on > >> >> >> > So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000. > >> >> >> > My attempted code (which is quite wrong): > >> >> >> > sort actor_id year projects ; > >> >> >> > by actor_id year: gen nvals = _n == 1 ; > >> >> >> > sort actor_id year project_id ; > >> >> >> > egen wanted = total(nvals & condition == 0), by(agency_id year) ; > >> >> >> > replace wanted = wanted - (nvals & condition == 0) ; > >> >> >> > >> > >> * > >> * For searches and help try: > >> * http://www.stata.com/help.cgi?search > >> * http://www.stata.com/support/faqs/resources/statalist-faq/ > >> * http://www.ats.ucla.edu/stat/stata/ > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/faqs/resources/statalist-faq/ > > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

