# st: egen and spontaneously changing numbers

 From Matt Rutledge To statalist@hsphsun2.harvard.edu Subject st: egen and spontaneously changing numbers Date Wed, 20 May 2009 12:41:54 -0400

```Hi,

```
Using Stata 10, I'm attempting to assign one person's identifier (DUPERSID from the MEPS dataset) to every person in the sample, repeating for each of the N people in my sample. The code I'm using seems to work, except that it spontaneously changes one digit of the identifier.
```
To illustrate, I've created this dummy dataset:
dupersid	date	x
40002015	19990101	1
40002015	19990201	0
40002015	19990301	0
40010010	19990101	0
40010010	19990201	1
40010010	19990301	1
41011144	19990101	1
41011144	19990201	0
41011144	19990301	1
and called it test.txt.

```
I then read in this dataset, and attempt to assign each observation the dupersid 40002015. In turn, I'll also want to assign all of them the identifier 40010010, and finally 41011144. So I do a forvalues loop:
```
set more off
insheet using test.txt, names
tostring dupersid, replace
rename dupersid dupersidsave
bysort dupersidsave: gen first = 1 if _n==1
replace first = 0 if first==.
summ first
local N = r(N)*r(mean)
forvalues j = 1/`N' {
preserve
gsort -first dupersidsave
gen dupers = dupersidsave if _n==`j' & first==1
destring dupers, replace
egen dupersid = max(dupers)
tostring dupersid, replace
gsort dupersidsave -first
list dupers*
des
restore
}

****
```
Here's the output. Please note that on the first pass through the loop, the identifier changes from 40002015 to 40002016. On the second pass, the identifier changes from 40010010 to 40010008. The third pass is fine. Any ideas why this might be? Using "egen, total" or "egen, mean" doesn't seem to help, nor does destringing the identifier at different points along the way. Also, I get the same error running it without a loop (replace `j' with 1, for instance, and the identifier still spontaneously changes).
```
(8 real changes made, 8 to missing)
dupersid was float now str8

+--------------------------------+
| dupers~e     dupers   dupersid |
|--------------------------------|
1. | 40002015   4.00e+07   40002016 |
2. | 40002015          .   40002016 |
3. | 40002015          .   40002016 |
4. | 40010010          .   40002016 |
5. | 40010010          .   40002016 |
|--------------------------------|
6. | 40010010          .   40002016 |
7. | 41011144          .   40002016 |
8. | 41011144          .   40002016 |
9. | 41011144          .   40002016 |
+--------------------------------+

Contains data
obs:             9
vars:             6
size:           261 (99.9% of memory free)
-------------------------------------------------------------------------------
storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
dupersidsave    long   %12.0g
date            long   %12.0g
x               byte   %8.0g
first           float  %9.0g
dupers          float  %9.0g
dupersid        str8   %9s
-------------------------------------------------------------------------------
Sorted by:  dupersidsave
Note:  dataset has changed since last saved
(8 real changes made, 8 to missing)
dupersid was float now str8

+--------------------------------+
| dupers~e     dupers   dupersid |
|--------------------------------|
1. | 40002015          .   40010008 |
2. | 40002015          .   40010008 |
3. | 40002015          .   40010008 |
4. | 40010010   4.00e+07   40010008 |
5. | 40010010          .   40010008 |
|--------------------------------|
6. | 40010010          .   40010008 |
7. | 41011144          .   40010008 |
8. | 41011144          .   40010008 |
9. | 41011144          .   40010008 |
+--------------------------------+

Contains data
obs:             9
vars:             6
size:           261 (99.9% of memory free)
-------------------------------------------------------------------------------
storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
dupersidsave    long   %12.0g
date            long   %12.0g
x               byte   %8.0g
first           float  %9.0g
dupers          float  %9.0g
dupersid        str8   %9s
-------------------------------------------------------------------------------
Sorted by:  dupersidsave
Note:  dataset has changed since last saved
(8 real changes made, 8 to missing)
dupersid was float now str8

+--------------------------------+
| dupers~e     dupers   dupersid |
|--------------------------------|
1. | 40002015          .   41011144 |
2. | 40002015          .   41011144 |
3. | 40002015          .   41011144 |
4. | 40010010          .   41011144 |
5. | 40010010          .   41011144 |
|--------------------------------|
6. | 40010010          .   41011144 |
7. | 41011144   4.10e+07   41011144 |
8. | 41011144          .   41011144 |
9. | 41011144          .   41011144 |
+--------------------------------+

Contains data
obs:             9
vars:             6
size:           261 (99.9% of memory free)
-------------------------------------------------------------------------------
storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
dupersidsave    long   %12.0g
date            long   %12.0g
x               byte   %8.0g
first           float  %9.0g
dupers          float  %9.0g
dupersid        str8   %9s
-------------------------------------------------------------------------------
Sorted by:  dupersidsave
Note:  dataset has changed since last saved

*****
Thanks,
Matt
rutledma@umich.edu

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```