Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Capturing unique actors meeting conditions by focal actor and year (excluding focal actor)
Date   Tue, 16 Oct 2012 11:19:41 +0100

First off, on my list of hobby-horses is a prejudice that the word
"unique" is misused here, although you are in very good company:
StataCorp itself does it in various places, e.g. -codebook-, although
I am working on changing their habits if I can. The word "unique"
strictly means occurring once only; I recommend the word "distinct"
for what you want. There is a longer discussion of terminology in

SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
        (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
        Q4/08   SJ 8(4):557--568
        shows how to answer questions about distinct observations
        from first principles; provides a convenience command

That said, when faced with a problem like yours, vague ideas of
possible solutions rise up. Is this a case for associative arrays as
implemented in Mata? Is there a cunning restructuring of the data from
which the answer would fall out easily? Precise inspiration was
lacking and what seemed crucial was that you need to consider each
actor in each combination of project and year. That pointed out to
loops over actors _and_ over project-years. Once that idea was taken
up, life is usually easier if all identifiers run over the integers
from 1 up. Also, the flavour of compiling a list and eventually
counting distinct members of other actors suggested -levelsof- and the
list manipulation tools documented at -help macrolists-.

So, here is my code. Absolutely nothing rules out other kinds of solutions.

input year     project_id     actor_id     condition     wanted
2000     1              1            1             2
2000     1              2            0             1
2000     1              3            0             1
2000     1              7            1             2
2000     2              1            1             2
2000     2              2            0             1
2000     2              3            0             1
2000     3              4            1             2
2000     3              5            0             1
2000     3              6            0             1
2000     3              .            .             .
2001     4              1            1             2
2001     4              2            0             1
2001     4              3            0             1
end

* identifiers guaranteed to run 1 up if the real ones don't!
* note that "same project, same year" defines a group
egen proj = group(project_id year), label
su proj, meanonly
local nproj = r(max)

egen act = group(actor_id), label
su act, meanonly
local nact = r(max)

gen mywanted = .

* lists of those in each project and year and condition == 0
qui forval p = 1/`nproj' {
	levelsof act if proj == `p' & condition == 0, local(who`p')
}

macro list

* now cycle over actors
qui forval a = 1/`nact' {

* blank out workspace
	local work

* if actor was included, we want to add that list to workspace
* in practice -if r(N)- will be true if and only if -r(N)- is positive
	forval p = 1/`nproj' {
		count if act == `a' & proj == `p'
		if r(N) local work `work' `who`p''
	}

* remove duplicates
	local work : list uniq work
* remove this actor
	local work : list work - a
* see what we got for debugging
	noi di "`a'       `work'"

	replace mywanted = `: list sizeof work' if act == `a'
}

Nick


On Tue, Oct 16, 2012 at 9:52 AM, Erik Aadland <erikaadland@hotmail.com> wrote:

> I am trying to generate a variable "wanted" that by each focal actor and year captures the total number of unique actors (excluding the focal actor) that meet a specified condition (condition == 0) and that the focal actor has occured together with in one or more projects.
> This is my data structure:
> year     project_id     actor_id     condition     wanted
> 2000     1              1            1             2
> 2000     1              2            0             1
> 2000     1              3            0             1
> 2000     1              7            1             2
> 2000     2              1            1             2
> 2000     2              2            0             1
> 2000     2              3            0             1
> 2000     3              4            1             2
> 2000     3              5            0             1
> 2000     3              6            0             1
> 2000     3              .            .             .
> 2001     4              1            1             2
> 2001     4              2            0             1
> 2001     4              3            0             1
> .....and so on
> So in year == 2000, actor_id == 1 has occurred with 2 unique actor_id (namely 2 and 3) meeting condition == 0 in projects. Therefore, wanted == 2 for actor_id == 1 in year == 2000.
> My attempted code (which is quite wrong):
> sort actor_id year projects ;
> by actor_id year: gen nvals = _n == 1 ;
> sort  actor_id year project_id ;
> egen wanted = total(nvals & condition == 0), by(agency_id year) ;
> replace wanted = wanted - (nvals & condition == 0) ;

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index