Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: generate new distinct individual ids by organizational id and year


From   Jeph Herrin <[email protected]>
To   [email protected]
Subject   Re: st: generate new distinct individual ids by organizational id and year
Date   Tue, 07 May 2013 09:39:13 -0400

If I understand correctly,

 bys org_id : egen last_id = max(ind_id)          // get the max id for each org
 egen tagged = tag(org_id year)	                  // tag for expanding
 expand missing_employees if tagged, gen(isnew)   // create new records for missing
 clonevar new_ind_id = ind_id
 bys org_id year isnew : replace new_ind_id = last_id+_n if isnew

hth,
J





On 5/7/2013 4:18 AM, Erik Aadland wrote:
Dear Statalist.

I have an unbalanced panel dataset containing organizations with distinct org_id, and employees within these organizations with distinct ind_id. Some employees are missing. I have calculated the total number of missing employees per org_id per year, called "missing_employees".

Now, by each org_id and year I need to generate a set of distinct ind_ids corresponding to the number of missing employees. The new_ind_ids generated for a distinct org_id and year should preferably be reused in later years.

In laymans terms, I need to generate new ind_id observations for each org_id and year so that I have an ind_id for all employees in each org_id and year.

Example:

year     org_id     ind_id     missing_employees
2000     1              1               1
2000     1              2               1
2000     2              3               2
2000     2              3               2
2000     2              4               2
2000     3              5               0
2001     1              2               2
2001     1              7               2

Given such a structure, I need to generate distinct new_ind_ids e.g. starting after the last distinct ind_id I have in my dataset (e.g. if last ind_id==100, new_ind_id starts at 101):

year     org_id     ind_id     missing_employees     new_ind_ids
2000     1              1               1
2000     1              2               1
2000     1                                                                            101
2000     2              3               2
2000     2              3               2
2000     2              4               2
2000     2                                                                            102
2000     2                                                                            103
2000     3              5               0
2001     1              2               2
2001     1              7               2
2001     1                                                                            101
2001     1                                                                            104

Note that for org_id==1 in 2000, I generate 1 new_ind_id == 101 since one employee was missing. In 2001 I have 2 employees missing for org_id == 1. Since I have already generated new_ind_id == 101 for org_id==1, this can be used again in 2001 and only one new_ind_id needs to be generated; new_ind_id == 104.

If the number of missing employees for a distinct org_id goes down from one year to the next, I would like to drop the corresponding number of new_ind_ids for that year (i.e. remove some of the reused new_ind_ids).

Any input on these problems would be greatly appreciated.

Kind regards,

Erik. 		 	   		
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index