Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: filling in existing ids and generating new ids for unique actors


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: filling in existing ids and generating new ids for unique actors
Date   Fri, 7 Sep 2012 14:12:01 +0100

Comments embedded below.

Nick

On Fri, Sep 7, 2012 at 1:03 PM, Erik Aadland <erikaadland@hotmail.com> wrote:
> Dear Statalist.
> I have an unbalanced panel dataset.
> The structure is as follows:
> year    actor_id    actor
> 2000    .           Paul
> 2001    .           Paul
> 2002    .           Paul
> 2000    .           Sarah
> 2001    1           Sarah
> 2002    1           Sarah
> 2000    .           Simon
> 2001    2           Simon
> 2002    2           Simon
> I have 2 problems:
> 1. I want to fill in the missing existing actor_id for those actors that already have an actor_id in some years but not others.

That's

bysort actor (actor_id) : replace actor_id = actor_id[_n-1] if
missing(actor_id)

But follow by a check:

by actor : assert actor_id[1] == actor_id[_N]

For the principles, see

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/02   SJ 2(1):86--102                                  (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N

> 2. I want to generate a new unique actor_id for those actors that have no actor_id in the dataset. This actor_id needs to be different from those already existing for other actors in the dataset.
> The variable -actor- lists the unique name for each actor and this unique name could be used as a basis for assigning the actor_id.

su actor_id, meanonly
local max = r(max)
egen new_actor_id = group(actor) if missing(actor_id)
replace actor_id = new_actor_id + `max' if missing(actor_id)

What this does:

1. Find the largest actor_id in use. So, it will be safe to use higher numbers.

2. Use -egen-'s -group()- to generate new ids to those without them.
These will run 1, 2, 3, ..

3. New actor_id = new id + maximum for those without them.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index