Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: filling in existing ids and generating new ids for unique actors
From 
 
Nick Cox <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: filling in existing ids and generating new ids for unique actors 
Date 
 
Fri, 7 Sep 2012 14:12:01 +0100 
Comments embedded below.
Nick
On Fri, Sep 7, 2012 at 1:03 PM, Erik Aadland <[email protected]> wrote:
> Dear Statalist.
> I have an unbalanced panel dataset.
> The structure is as follows:
> year    actor_id    actor
> 2000    .           Paul
> 2001    .           Paul
> 2002    .           Paul
> 2000    .           Sarah
> 2001    1           Sarah
> 2002    1           Sarah
> 2000    .           Simon
> 2001    2           Simon
> 2002    2           Simon
> I have 2 problems:
> 1. I want to fill in the missing existing actor_id for those actors that already have an actor_id in some years but not others.
That's
bysort actor (actor_id) : replace actor_id = actor_id[_n-1] if
missing(actor_id)
But follow by a check:
by actor : assert actor_id[1] == actor_id[_N]
For the principles, see
SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/02   SJ 2(1):86--102                                  (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N
> 2. I want to generate a new unique actor_id for those actors that have no actor_id in the dataset. This actor_id needs to be different from those already existing for other actors in the dataset.
> The variable -actor- lists the unique name for each actor and this unique name could be used as a basis for assigning the actor_id.
su actor_id, meanonly
local max = r(max)
egen new_actor_id = group(actor) if missing(actor_id)
replace actor_id = new_actor_id + `max' if missing(actor_id)
What this does:
1. Find the largest actor_id in use. So, it will be safe to use higher numbers.
2. Use -egen-'s -group()- to generate new ids to those without them.
These will run 1, 2, 3, ..
3. New actor_id = new id + maximum for those without them.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/