Re: st: a variable is decreased by 1 ?

 From n j cox <[email protected]> To [email protected] Subject Re: st: a variable is decreased by 1 ? Date Mon, 08 Jan 2007 13:19:14 +0000

This is a standard precision problem. By default
your new variable is born as of type -float-. Remember
that computers use binary arithmetic, and thus use
binary approximations to decimal values. Even with integers
those approximations are not necessarily exact.
In your example values appear to decrease by 1,
but other small decreases or increases would be observed
with other values. The best solution in your case is
to use -long- for the new variable type or to hold identifiers
as strings.

Various recent pieces in the Stata Journal offer full discussions
but sections in [U] and various FAQs add details.

Note that this is nothing to do with the format, except insofar
as the default format may obscure the problem. Even if
you could set -format- in advance of a -generate-, that would not
help. Changing the format affects presentation of values, but
not the values themselves.

Nick
[email protected]

Gijs Dekkers

I have a panel dataset, with the individual number 'pid' and the year variabele 'jaar'. The variable pidnew is an alternative individual-identification number. This variabel, and 'jaar' don't really do anything, but I include it to show that the individual does not changes in what follows.

So, the starting dataset is:

. list pid pidnew jaar if (pid==17025101 & jaar==2002)

+--------------------------+
| pid pidnew jaar |
|--------------------------|
10597. | 17025101 62 2002 |
+--------------------------+

So this is the data line of individual 17025101 in the year 2002. Now I create a copy ('pidold') of this individual identification number,

. gen pidold = pid

. list pid pidold pidnew jaar if (pid==17025101 & jaar==2002)

+-------------------------------------+
| pid pidold pidnew jaar |
|-------------------------------------|
10502. | 17025101 1.70e+07 62 2002 |
+-------------------------------------+

The resulting variable is difficult to read, for it is in the scientific notation. So I change the format...

. format pid pidold epid %14.0f
. list pid pidold pidnew jaar if (pid==17025101 & jaar==2002)

+-------------------------------------+
| pid pidold pidnew jaar |
|-------------------------------------|
10597. | 17025101 17025100 62 2002 |
+-------------------------------------+

And all of a sudden, the variabele 'pidold', which is supposed to be equal to 'pid' has decreased by one! Or, which is more likely, the 'gen pidold = pid' has not done what it should have done, because the format of pidold was set so that information was lost. However, I cannot set a format of a variable prior to its existence.

I have also tried juggling with 'recast' and different formats, but nothing seems to help.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/