Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Splitting numeric values


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Splitting numeric values
Date   Mon, 23 Jul 2007 19:47:01 +0100

There were several answers to this, including two by Michael Hanson
and Maarten Buis that gave good advice. 

The key issue remains: Why didn't -tostring- work here to 
do what Susan wants? 

I can't see her data, but there is one key issue I can think of. 

-tostring- is a lot of wrapping, including several safety features, 
around a central command of the form 

gen <strvar> = string(<numvar>, "<format>") 

where <strvar> means string variable, <numvar> means numeric 
variable, and <format> is a numeric format. 

By default, the format used is %12.0g. This is clearly and
explicitly documented at [D] destring. That default is not an 
appropriate format for 18 digit integers. See what happens: 

. di string(123456789012345678, "%12.0g")
1.23457e+17

Hence -tostring- is telling you, somewhat gnomically, that 
the default format would lead to loss of information. 

The default format reflects a compromise between conflicting 
desiderata, but one consideration was the very frequent 
use of 9 digit identifiers. 18 digit identifiers are 
a different deal altogether. 

Importantly: unless your identifier was originally read into 
Stata as a -double-, it may have been degraded even before
you tried string manipulations. This may not
matter for your immediate purpose as you appear not to be 
interested in the final digits, but it would definitely 
matter for many other purposes. Floats cannot hold _all_
the digits of 18 digit integers correctly, and use 
approximations in most cases. 

That is, a practice of reading in variable(s) that are 
18 digit integers as -float- will damage your data. 
If that is what you did, I advise you redo any 
calculations on the data read in as -double-. 

Incidentally, a posting a short while ago tried
to bring together a zeroth tutorial on string-numeric
conversions: 

<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0707/Author/article-469.html>

Nick 
n.j.cox@durham.ac.uk 

Susan Olivia
 
> I have a numeric variable (call it id) that comes with 18 
> digits and I would
> like to create a new variable that extracts from the variable 
> 'id' starting
> from the 10th digits and get 4 digits from here.
> 
> E.g. my id is given as: 610102001001010300 and I want to 
> create 'newvar'
> which has value of 0010. 
> 
> I know this can be easily done using the 'substr' command, 
> however, I am
> having a problem in converting the 'id' into string variable. 
> It gives me
> the following command: 
> 
> *** 
> tostring id, replace
> id cannot be converted reversibly; no replace
> 
> ***

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index