Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Date   Wed, 17 Oct 2012 10:16:09 +0100

Good. Thanks for the closure. The Stata stuff to understand includes

1. How -egen, group()- guarantees clean identifiers 1 ip so that we
can easily cycle over all possibilities with -forval-.

2. -levelsof-.

3. List manipulation commands documented at -help macrolists-.

4. -count- for counting (a neglected command).

Each time you find a "{", look for its pairing "}" to see the
structure of loops.

On Wed, Oct 17, 2012 at 8:51 AM, Erik Aadland <erikaadland@hotmail.com> wrote:
> Thank you so much Nick!
> This adjusted code generates exactly what I was looking for.
> Now, I need to spend some time examining your code to absorb its logic and what it actually does.
> The embedded explanations in the code will be of great assistance to me in this regard.
> Thanks again and kind regards,
> Erik.
>
> ----------------------------------------
>> Date: Tue, 16 Oct 2012 14:48:36 +0100
>> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> From: njcoxstata@gmail.com
>> To: statalist@hsphsun2.harvard.edu
>>
>> It seems that you want a separate count for each year.
>>
>> If that's so, the code looks more like
>>
>> clear
>>
>> input year project_id actor_id condition
>> 2000 1 1 1
>> 2000 2 1 1
>> 2000 1 2 0
>> 2000 2 2 0
>> 2000 1 3 0
>> 2000 2 3 0
>> 2000 3 4 1
>> 2000 3 5 0
>> 2000 3 6 0
>> 2000 4 7 0
>> 2001 5 1 1
>> 2001 5 2 0
>> 2001 6 2 0
>> 2001 5 3 0
>> 2001 6 3 0
>> 2001 5 4 1
>> 2001 6 4 1
>> 2001 7 5 0
>> 2001 7 6 0
>> 2001 7 7 0
>> 2001 8 8 0
>>
>> end
>>
>> egen proj = group(project_id year), label
>> su proj, meanonly
>> local nproj = r(max)
>>
>> egen act = group(actor_id), label
>> su act, meanonly
>> local nact = r(max)
>>
>> egen yr = group(year), label
>> su yr, meanonly
>> local nyr = r(max)
>>
>> gen mywanted = .
>>
>> * lists of those in each project and year and condition == 0
>> qui forval p = 1/`nproj' {
>> levelsof act if proj == `p' & condition == 0, local(who`p')
>> }
>>
>> macro list
>>
>> * cycle over years
>>
>> qui forval y = 1/`nyr' {
>>
>> * now cycle over actors
>> qui forval a = 1/`nact' {
>>
>> * blank out workspace
>> local work
>>
>> * if actor was included, we want to add that list to workspace
>> forval p = 1/`nproj' {
>> count if act == `a' & proj == `p' & yr == `y'
>> if r(N) local work `work' `who`p''
>> }
>>
>> * remove duplicates
>> local work : list uniq work
>> * remove this actor
>> local work : list work - a
>> * see what we got for debugging
>> noi di "`a' `work'"
>>
>> replace mywanted = `: list sizeof work' if act == `a' & yr == `y'
>> }
>> }
>>
>>
>>
>>
>> On Tue, Oct 16, 2012 at 1:00 PM, Erik Aadland <erikaadland@hotmail.com> wrote:
>> > This is correct.
>> > So, referring to the results in my previous post.
>> > In year==2000, actor_id == 4|5|6 occur only in project_id==3, and for actor_id== 5 and 6 condition==0. Actor_id==4 should have a mywanted score == 2, while actor_id==5 and 6 should each have a mywanted score == 1. Actor_id == 7 occurs only in project_id==4 this year and has shared projects with none other in this year (and therefore shares no project_id with any actor_id with condition==0) and should have a mywantedscore == 0.
>> > It puzzles me why the suggested code generates correct mywanted scores for the actor_ids in project_id==1 and 2, but not in project_id== 3 and 4.
>> > Kind regards,
>> > Erik.
>> >
>> >
>> > ----------------------------------------
>> >> Date: Tue, 16 Oct 2012 12:44:11 +0100
>> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> >> From: njcoxstata@gmail.com
>> >> To: statalist@hsphsun2.harvard.edu
>> >>
>> >> What you asked for, as I understood it, was the total number of distinct actors
>> >>
>> >> 1. that meet a specified condition (condition == 0)
>> >>
>> >> and
>> >>
>> >> 2. with which any actor has shared one or more projects in the same year.
>> >>
>> >> (I am ignoring the word "focal", which you haven't defined and I don't
>> >> understand.)
>> >>
>> >> If you want something else, please give the definition. It's not
>> >> enough (for me) to say that the code gives the wrong answer in some
>> >> cases. Note that my code gives the same answer for each actor, as it
>> >> is a total over all (project, year) possibilities. If you want a
>> >> different count for each (project, year) you'll need to modify the
>> >> code accordingly.
>> >>
>> >> Nick
>> >>
>> >> On Tue, Oct 16, 2012 at 12:25 PM, Erik Aadland <erikaadland@hotmail.com> wrote:
>> >> > The code works. Thank you Nick.
>> >> > However, I am experiencing a few problems that I suspect stem from more detailed differences in my data structure. Detailed differences that depart from the structure I previously specified in this post.
>> >> > In particular, I might have some projects in which only one actor is present.
>> >> > Showing by example is perhaps easiest. Here is the result from Nick's code (based on my previously supplied data structure) on a slightly expanded dataset:
>> >> > year project_id actor_id condition proj act mywanted
>> >> > 2000 1 1 1 1 2000 1 2
>> >> > 2000 2 1 1 2 2000 1 2
>> >> > 2000 1 2 0 1 2000 2 1
>> >> > 2000 2 2 0 2 2000 2 1
>> >> > 2000 1 3 0 1 2000 3 1
>> >> > 2000 2 3 0 2 2000 3 1
>> >> > 2000 3 4 1 3 2000 4 4
>> >> > 2000 3 5 0 3 2000 5 2
>> >> > 2000 3 6 0 3 2000 6 2
>> >> > 2000 4 7 0 4 2000 7 2
>> >> > 2001 5 1 1 5 2001 1 2
>> >> > 2001 5 2 0 5 2001 2 1
>> >> > 2001 6 2 0 6 2001 2 1
>> >> > 2001 5 3 0 5 2001 3 1
>> >> > 2001 6 3 0 6 2001 3 1
>> >> > 2001 5 4 1 5 2001 4 4
>> >> > 2001 6 4 1 6 2001 4 4
>> >> > 2001 7 5 0 7 2001 5 2
>> >> > 2001 7 6 0 7 2001 6 2
>> >> > 2001 7 7 0 7 2001 7 2
>> >> > 2001 8 8 0 8 2001 8 0
>> >> >
>> >> > In this result (focusing on year==2000 only now), mywanted scores for actor_id==7 in project_id==4 is incorrect (correct mywanted==0). The mywanted scores for actor_ids in project_id==3 are also incorrect.
>> >> >
>> >> > In year==2001, the mywanted score==0 for actor_id==8 in project_id==8 is on the other hand correct.
>> >> > How get around this? I am sorry that I did not include these structural details in my initial post.
>> >> > Sincerely,
>> >> > Erik.
>> >> >
>> >> > ----------------------------------------
>> >> >> Date: Tue, 16 Oct 2012 11:49:22 +0100
>> >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> >> >> From: njcoxstata@gmail.com
>> >> >> To: statalist@hsphsun2.harvard.edu
>> >> >>
>> >> >> I suspect that you didn't copy all the code. The last line of code
>> >> >> has a brace (curly bracket }) by itself.
>> >> >>
>> >> >> On Tue, Oct 16, 2012 at 11:44 AM, Erik Aadland <erikaadland@hotmail.com> wrote:
>> >> >> > Thank you Nick!
>> >> >> > You are quite right. I was imprecise; it is distinct actors I want to capture.
>> >> >> > When I run your suggested code, I get this error message after the following line of code:
>> >> >> >
>> >> >> > qui forval a = 1/`nact' {
>> >> >> > unexpected end of file
>> >> >> > r(612);
>> >> >> >
>> >> >> > What could possibly cause this error message? I am using Stata 10.
>> >> >> > Thanks again and kind regards,
>> >> >> > Erik.
>> >> >> >
>> >> >> >
>> >> >> > ----------------------------------------
>> >> >> >> Date: Tue, 16 Oct 2012 11:19:41 +0100
>> >> >> >> Subject: Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
>> >> >> >> From: njcoxstata@gmail.com
>> >> >> >> To: statalist@hsphsun2.harvard.edu
>> >> >> >>
>> >> >> >> First off, on my list of hobby-horses is a prejudice that the word
>> >> >> >> "unique" is misused here, although you are in very good company:
>> >> >> >> StataCorp itself does it in various places, e.g. -codebook-, although
>> >> >> >> I am working on changing their habits if I can. The word "unique"
>> >> >> >> strictly means occurring once only; I recommend the word "distinct"
>> >> >> >> for what you want. There is a longer discussion of terminology in
>> >> >> >>
>> >> >> >> SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
>> >> >> >> (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
>> >> >> >> Q4/08 SJ 8(4):557--568
>> >> >> >> shows how to answer questions about distinct observations
>> >> >> >> from first principles; provides a convenience command
>> >> >> >>
>> >> >> >> That said, when faced with a problem like yours, vague ideas of
>> >> >> >> possible solutions rise up. Is this a case for associative arrays as
>> >> >> >> implemented in Mata? Is there a cunning restructuring of the data from
>> >> >> >> which the answer would fall out easily? Precise inspiration was
>> >> >> >> lacking and what seemed crucial was that you need to consider each
>> >> >> >> actor in each combination of project and year. That pointed out to
>> >> >> >> loops over actors _and_ over project-years. Once that idea was taken
>> >> >> >> up, life is usually easier if all identifiers run over the integers
>> >> >> >> from 1 up. Also, the flavour of compiling a list and eventually
>> >> >> >> counting distinct members of other actors suggested -levelsof- and the
>> >> >> >> list manipulation tools documented at -help macrolists-.
>> >> >> >>
>> >> >> >> So, here is my code. Absolutely nothing rules out other kinds of solutions.
>> >> >> >>
>> >> >> >> input year project_id actor_id condition wanted
>> >> >> >> 2000 1 1 1 2
>> >> >> >> 2000 1 2 0 1
>> >> >> >> 2000 1 3 0 1
>> >> >> >> 2000 1 7 1 2
>> >> >> >> 2000 2 1 1 2
>> >> >> >> 2000 2 2 0 1
>> >> >> >> 2000 2 3 0 1
>> >> >> >> 2000 3 4 1 2
>> >> >> >> 2000 3 5 0 1
>> >> >> >> 2000 3 6 0 1
>> >> >> >> 2000 3 . . .
>> >> >> >> 2001 4 1 1 2
>> >> >> >> 2001 4 2 0 1
>> >> >> >> 2001 4 3 0 1
>> >> >> >> end
>> >> >> >>
>> >> >> >> * identifiers guaranteed to run 1 up if the real ones don't!
>> >> >> >> * note that "same project, same year" defines a group
>> >> >> >> egen proj = group(project_id year), label
>> >> >> >> su proj, meanonly
>> >> >> >> local nproj = r(max)
>> >> >> >>
>> >> >> >> egen act = group(actor_id), label
>> >> >> >> su act, meanonly
>> >> >> >> local nact = r(max)
>> >> >> >>
>> >> >> >> gen mywanted = .
>> >> >> >>
>> >> >> >> * lists of those in each project and year and condition == 0
>> >> >> >> qui forval p = 1/`nproj' {
>> >> >> >> levelsof act if proj == `p' & condition == 0, local(who`p')
>> >> >> >> }
>> >> >> >>
>> >> >> >> macro list
>> >> >> >>
>> >> >> >> * now cycle over actors
>> >> >> >> qui forval a = 1/`nact' {
>> >> >> >>
>> >> >> >> * blank out workspace
>> >> >> >> local work
>> >> >> >>
>> >> >> >> * if actor was included, we want to add that list to workspace
>> >> >> >> * in practice -if r(N)- will be true if and only if -r(N)- is positive
>> >> >> >> forval p = 1/`nproj' {
>> >> >> >> count if act == `a' & proj == `p'
>> >> >> >> if r(N) local work `work' `who`p''
>> >> >> >> }
>> >> >> >>
>> >> >> >> * remove duplicates
>> >> >> >> local work : list uniq work
>> >> >> >> * remove this actor
>> >> >> >> local work : list work - a
>> >> >> >> * see what we got for debugging
>> >> >> >> noi di "`a' `work'"
>> >> >> >>
>> >> >> >> replace mywanted = `: list sizeof work' if act == `a'
>> >> >> >> }
>> >> >> >>
>> >> >> >> Nick
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <erikaadland@hotmail.com> wrote:
>> >> >> >>
>> >> >> >> > I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects.
>> >> >> >> > This is my data structure:
>> >> >> >> > year project_id actor_id condition wanted
>> >> >> >> > 2000 1 1 1 2
>> >> >> >> > 2000 1 2 0 1
>> >> >> >> > 2000 1 3 0 1
>> >> >> >> > 2000 1 7 1 2
>> >> >> >> > 2000 2 1 1 2
>> >> >> >> > 2000 2 2 0 1
>> >> >> >> > 2000 2 3 0 1
>> >> >> >> > 2000 3 4 1 2
>> >> >> >> > 2000 3 5 0 1
>> >> >> >> > 2000 3 6 0 1
>> >> >> >> > 2000 3 . . .
>> >> >> >> > 2001 4 1 1 2
>> >> >> >> > 2001 4 2 0 1
>> >> >> >> > 2001 4 3 0 1
>> >> >> >> > .....and so on
>> >> >> >> > So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000.
>> >> >> >> > My attempted code (which is quite wrong):
>> >> >> >> > sort actor_id year projects ;
>> >> >> >> > by actor_id year: gen nvals = _n == 1 ;
>> >> >> >> > sort actor_id year project_id ;
>> >> >> >> > egen wanted = total(nvals & condition == 0), by(agency_id year) ;
>> >> >> >> > replace wanted = wanted - (nvals & condition == 0) ;
>> >> >> >>
>> >>
>> >> *
>> >> * For searches and help try:
>> >> * http://www.stata.com/help.cgi?search
>> >> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> >> * http://www.ats.ucla.edu/stat/stata/
>> > *
>> > * For searches and help try:
>> > * http://www.stata.com/help.cgi?search
>> > * http://www.stata.com/support/faqs/resources/statalist-faq/
>> > * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index