Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: transform string to numeric vars


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: transform string to numeric vars
Date   Tue, 3 Jun 2003 19:07:40 +0100

Harald Seider

> I came across a weird problem with STATAs destring command
> (I am using
> Stata8 on a XP environment).
>
> My intention was to transfer a string variable "hospid"
> into a numeric
> variable (regardless the usefulness of such an action).
> So far I understand there are two ways to proceed:
> real(var) and destring
> var. I would have expected that both commands will give
> identical results.
> Unfortunately they don't.
>
> So I compared both commands:
>
>   'generate hospidold = real(hospid)' and
> 'destring hospid, generate(hospidnew)'
>
> When comparing hospidold and hospidnew I get slightly
> different results.
>
> Example:
>
> variables:      hospid(string)          hospidold(float)
> hospidnew(long)
> content:        106010735               106010736
>     106010735
>
> Therefore   'generate hospidold = real(hospid)' gives wrong results.
>
> Has anybody faced this problem before? Is it a well non
> issue (I haven't
> found anything on the list so far).

You say that the problem is with -destring- but then say that
-real()- gives wrong results.

The real issue lies elsewhere.

The default -float- data type used for -hospidold- does not
have enough bits to hold every last digit correctly. You
are seeing a -float- variable's best stab at your value, which
happens to be 1 off (although, for an identifier, that's enough
to be quite wrong). Try

. generate long ...

or

. generate double ...

and I think all will be well.

By the way, there is a fairly detailed general discussion of
numbers and strings at

Speaking Stata: On numbers and strings. Stata Journal
2(3):314--329 (2002)

which explains the use of numeric and string data types
in Stata and how to convert from one kind to another.
-destring- and -real()- do not necessarily give identical
results, although for your problem they should once you
specify an appropriate data type.

Many people keep identifiers as string for this kind of
reason, even if all characters are numeric.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index