Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: generate new distinct individual ids by organizational id and year


From   Erik Aadland <erikaadland@hotmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: generate new distinct individual ids by organizational id and year
Date   Fri, 10 May 2013 07:32:46 +0000

Dear Jeph and Statalist.

Thank you so much.
I replaced "max_id" with "ind_id" in the third line of the code to get the max "ind_id" as "max_id" is not a variable name in my dataset:

sum missing_employees, meanonly
local maxmiss=r(max) // get the max # missing
sum ind_id, meanonly
local maxid = r(max) // get the max id in the dataset
local order=ceil(log10(`maxid'+`maxmiss'))+1 // next power of 10, add 1

The code now appears to work wonderfully on my test dataset.
I will do some more testing, but this seems right.

Thanks again.

Sincerely,

Erik.





> Date: Wed, 8 May 2013 11:36:16 -0400
> From: stata@spandrel.net
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: generate new distinct individual ids by organizational id and year
> 
> 
> Sorry, I didn't observe that ind_ids were unique across the sample (though it was true in your example); so we'll need 
> to incorporate the org_id into the new_id.
> 
> I also used -expand- incorrectly (hence the error for missing_employees=1). Try this:
> 
> sum missing_employees, meanonly
> local maxmiss=r(max) // get the max # missing
> sum max_id, meanonly
> local maxid = r(max) // get the max id in the dataset
> local order=ceil(log10(`maxid'+`maxmiss'))+1 // next power of 10, add 1
> 
> egen tagged = tag(org_id year)	 // tag one record for expanding
> expand missing_employees+1 if tagged, gen(isnew) // create new records for missing (note +1!)
> clonevar new_ind_id = ind_id
> bys org_id year isnew : replace new_ind_id = org_id*`order'+`maxid'+_n if isnew
> 
> 
> hth,
> Jeph
> 
> 
> On 5/8/2013 3:22 AM, Erik Aadland wrote:
> > Dear Jeph and Statalist.
> >
> > Thank you for your suggested code. This is on the right track, but I still have some problems.
> >
> > I tweaked the code at little bit, as max(ind_id) is for the total dataset and not by org_id:
> >
> > egen last_id = max(ind_id) // get the max id in the dataset
> > egen tagged = tag(org_id year)	 // tag for expanding
> > expand missing_employees if tagged, gen(isnew) // create new records for missing
> > clonevar new_ind_id = ind_id
> > bys org_id year isnew : replace new_ind_id = last_id+_n if isnew
> >
> > I run this code with the following small example dataset:
> >
> > year	org_id	ind_id	missing_employees
> > 2000	1	1	1
> > 2000	1	2	1
> > 2000	2	3	2
> > 2000	2	3	2
> > 2000	2	4	2
> > 2000	3	5	0
> > 2001	1	2	2
> > 2001	1	7	2
> > 2002	1	2	1
> >
> > The desired result (in terms of new_ind_id) should be something like:
> >
> > year	org_id	ind_id	missing_employees new_ind_id
> > 2000	1	1	1	 1
> > 2000	1	2	1	 2
> > 2000 1 8
> > 2000	2	3	2	 3
> > 2000	2	3	2	 3
> > 2000	2	4	2	 3
> > 2000	2	 9
> > 2000	2	 10
> > 2000	3	5	0	 5
> > 2001	1	2	2	 2
> > 2001	1	7	2	 2
> > 2001	1	 8
> > 2001	1	 11
> > 2002	1	2	1	 8
> >
> > When running the code I get the following result:
> >
> > year	org_id	 ind_id	 missing_employees	last_id	tagged	isnew	new_ind_id
> > 2000	1	 1	 1	 7	1	0	1
> > 2000	1	 2	 1	 7	0	0	2
> > 2000	2	 3	 2	 7	1	1	8
> > 2000	2	 3	 2	 7	0	0	3
> > 2000	2	 3	 2	 7	1	0	3
> > 2000	2	 4	 2	 7	0	0	4
> > 2000	3	 5	 0	 7	1	0	5
> > 2001	1	 2	 2	 7	1	1	8
> > 2001	1	 2	 2	 7	1	0	2
> > 2001	1	 7	 2	 7	0	0	7
> > 2002	1	 2	 1	 7	1	0	2
> >
> >
> > One problem seems to be that the code does not appear to recognize missing_employees == 1.
> > Another problem seems to be that the codes uses the same distinct new_ind_ind==8 for both org_id=1 and org_id ==2.
> >
> > Any input on this problem would be greatly appreciated.
> >
> > Sincerely,
> >
> > Erik
> >
> >
> >
> >
> >
> >
> >> Date: Tue, 7 May 2013 09:39:13 -0400
> >> From: stata@spandrel.net
> >> To: statalist@hsphsun2.harvard.edu
> >> Subject: Re: st: generate new distinct individual ids by organizational id and year
> >>
> >> If I understand correctly,
> >>
> >> bys org_id : egen last_id = max(ind_id) // get the max id for each org
> >> egen tagged = tag(org_id year)	 // tag for expanding
> >> expand missing_employees if tagged, gen(isnew) // create new records for missing
> >> clonevar new_ind_id = ind_id
> >> bys org_id year isnew : replace new_ind_id = last_id+_n if isnew
> >>
> >> hth,
> >> J
> >>
> >>
> >>
> >>
> >>
> >> On 5/7/2013 4:18 AM, Erik Aadland wrote:
> >>> Dear Statalist.
> >>>
> >>> I have an unbalanced panel dataset containing organizations with distinct org_id, and employees within these organizations with distinct ind_id. Some employees are missing. I have calculated the total number of missing employees per org_id per year, called "missing_employees".
> >>>
> >>> Now, by each org_id and year I need to generate a set of distinct ind_ids corresponding to the number of missing employees. The new_ind_ids generated for a distinct org_id and year should preferably be reused in later years.
> >>>
> >>> In laymans terms, I need to generate new ind_id observations for each org_id and year so that I have an ind_id for all employees in each org_id and year.
> >>>
> >>> Example:
> >>>
> >>> year org_id ind_id missing_employees
> >>> 2000 1 1 1
> >>> 2000 1 2 1
> >>> 2000 2 3 2
> >>> 2000 2 3 2
> >>> 2000 2 4 2
> >>> 2000 3 5 0
> >>> 2001 1 2 2
> >>> 2001 1 7 2
> >>>
> >>> Given such a structure, I need to generate distinct new_ind_ids e.g. starting after the last distinct ind_id I have in my dataset (e.g. if last ind_id==100, new_ind_id starts at 101):
> >>>
> >>> year org_id ind_id missing_employees new_ind_ids
> >>> 2000 1 1 1
> >>> 2000 1 2 1
> >>> 2000 1 101
> >>> 2000 2 3 2
> >>> 2000 2 3 2
> >>> 2000 2 4 2
> >>> 2000 2 102
> >>> 2000 2 103
> >>> 2000 3 5 0
> >>> 2001 1 2 2
> >>> 2001 1 7 2
> >>> 2001 1 101
> >>> 2001 1 104
> >>>
> >>> Note that for org_id==1 in 2000, I generate 1 new_ind_id == 101 since one employee was missing. In 2001 I have 2 employees missing for org_id == 1. Since I have already generated new_ind_id == 101 for org_id==1, this can be used again in 2001 and only one new_ind_id needs to be generated; new_ind_id == 104.
> >>>
> >>> If the number of missing employees for a distinct org_id goes down from one year to the next, I would like to drop the corresponding number of new_ind_ids for that year (i.e. remove some of the reused new_ind_ids).
> >>>
> >>> Any input on these problems would be greatly appreciated.
> >>>
> >>> Kind regards,
> >>>
> >>> Erik.
> >>> *
> >>> * For searches and help try:
> >>> * http://www.stata.com/help.cgi?search
> >>> * http://www.stata.com/support/faqs/resources/statalist-faq/
> >>> * http://www.ats.ucla.edu/stat/stata/
> >>>
> >>>
> >> *
> >> * For searches and help try:
> >> * http://www.stata.com/help.cgi?search
> >> * http://www.stata.com/support/faqs/resources/statalist-faq/
> >> * http://www.ats.ucla.edu/stat/stata/ 
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
> >
> >
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/ 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index