Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Erik Aadland <erikaadland@hotmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: generate new distinct individual ids by organizational id and year |

Date |
Wed, 8 May 2013 07:22:51 +0000 |

Dear Jeph and Statalist. Thank you for your suggested code. This is on the right track, but I still have some problems. I tweaked the code at little bit, as max(ind_id) is for the total dataset and not by org_id: egen last_id = max(ind_id) // get the max id in the dataset egen tagged = tag(org_id year) // tag for expanding expand missing_employees if tagged, gen(isnew) // create new records for missing clonevar new_ind_id = ind_id bys org_id year isnew : replace new_ind_id = last_id+_n if isnew I run this code with the following small example dataset: year org_id ind_id missing_employees 2000 1 1 1 2000 1 2 1 2000 2 3 2 2000 2 3 2 2000 2 4 2 2000 3 5 0 2001 1 2 2 2001 1 7 2 2002 1 2 1 The desired result (in terms of new_ind_id) should be something like: year org_id ind_id missing_employees new_ind_id 2000 1 1 1 1 2000 1 2 1 2 2000 1 8 2000 2 3 2 3 2000 2 3 2 3 2000 2 4 2 3 2000 2 9 2000 2 10 2000 3 5 0 5 2001 1 2 2 2 2001 1 7 2 2 2001 1 8 2001 1 11 2002 1 2 1 8 When running the code I get the following result: year org_id ind_id missing_employees last_id tagged isnew new_ind_id 2000 1 1 1 7 1 0 1 2000 1 2 1 7 0 0 2 2000 2 3 2 7 1 1 8 2000 2 3 2 7 0 0 3 2000 2 3 2 7 1 0 3 2000 2 4 2 7 0 0 4 2000 3 5 0 7 1 0 5 2001 1 2 2 7 1 1 8 2001 1 2 2 7 1 0 2 2001 1 7 2 7 0 0 7 2002 1 2 1 7 1 0 2 One problem seems to be that the code does not appear to recognize missing_employees == 1. Another problem seems to be that the codes uses the same distinct new_ind_ind==8 for both org_id=1 and org_id ==2. Any input on this problem would be greatly appreciated. Sincerely, Erik > Date: Tue, 7 May 2013 09:39:13 -0400 > From: stata@spandrel.net > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: generate new distinct individual ids by organizational id and year > > If I understand correctly, > > bys org_id : egen last_id = max(ind_id) // get the max id for each org > egen tagged = tag(org_id year) // tag for expanding > expand missing_employees if tagged, gen(isnew) // create new records for missing > clonevar new_ind_id = ind_id > bys org_id year isnew : replace new_ind_id = last_id+_n if isnew > > hth, > J > > > > > > On 5/7/2013 4:18 AM, Erik Aadland wrote: > > Dear Statalist. > > > > I have an unbalanced panel dataset containing organizations with distinct org_id, and employees within these organizations with distinct ind_id. Some employees are missing. I have calculated the total number of missing employees per org_id per year, called "missing_employees". > > > > Now, by each org_id and year I need to generate a set of distinct ind_ids corresponding to the number of missing employees. The new_ind_ids generated for a distinct org_id and year should preferably be reused in later years. > > > > In laymans terms, I need to generate new ind_id observations for each org_id and year so that I have an ind_id for all employees in each org_id and year. > > > > Example: > > > > year org_id ind_id missing_employees > > 2000 1 1 1 > > 2000 1 2 1 > > 2000 2 3 2 > > 2000 2 3 2 > > 2000 2 4 2 > > 2000 3 5 0 > > 2001 1 2 2 > > 2001 1 7 2 > > > > Given such a structure, I need to generate distinct new_ind_ids e.g. starting after the last distinct ind_id I have in my dataset (e.g. if last ind_id==100, new_ind_id starts at 101): > > > > year org_id ind_id missing_employees new_ind_ids > > 2000 1 1 1 > > 2000 1 2 1 > > 2000 1 101 > > 2000 2 3 2 > > 2000 2 3 2 > > 2000 2 4 2 > > 2000 2 102 > > 2000 2 103 > > 2000 3 5 0 > > 2001 1 2 2 > > 2001 1 7 2 > > 2001 1 101 > > 2001 1 104 > > > > Note that for org_id==1 in 2000, I generate 1 new_ind_id == 101 since one employee was missing. In 2001 I have 2 employees missing for org_id == 1. Since I have already generated new_ind_id == 101 for org_id==1, this can be used again in 2001 and only one new_ind_id needs to be generated; new_ind_id == 104. > > > > If the number of missing employees for a distinct org_id goes down from one year to the next, I would like to drop the corresponding number of new_ind_ids for that year (i.e. remove some of the reused new_ind_ids). > > > > Any input on these problems would be greatly appreciated. > > > > Kind regards, > > > > Erik. > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/faqs/resources/statalist-faq/ > > * http://www.ats.ucla.edu/stat/stata/ > > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: generate new distinct individual ids by organizational id and year***From:*Jeph Herrin <stata@spandrel.net>

**References**:**st: generate new distinct individual ids by organizational id and year***From:*Erik Aadland <erikaadland@hotmail.com>

**Re: st: generate new distinct individual ids by organizational id and year***From:*Jeph Herrin <stata@spandrel.net>

- Prev by Date:
**st: Stata journal paper on interaction terms in count models** - Next by Date:
**Re: st: Stata journal paper on interaction terms in count models** - Previous by thread:
**Re: st: generate new distinct individual ids by organizational id and year** - Next by thread:
**Re: st: generate new distinct individual ids by organizational id and year** - Index(es):