Cruces,GA > > Working with large datasets, I've found a problem regarding > observations > id: my originals are way too long (say, strings 20 of the form > "PROVINCE-CITY-HOUSEHOLD..."). The id variable only is > sometimes half of > my file. Generating numerical ids (as explained in a very > useful FAQ by > N. Cox) is useful, but then I sometimes have problems with > the rounding > of numbers (since I have ids from 1 to, say, 16 millions). > > I thought about a solution which uses strings but is more > compact than > my original, which is storing numerical ids as strings in > hexadecimal > notation. I've found a discussion by W. Gould on this list, but this > referred basically as hex as a form of displaying numbers > (from a FAQ: > "Stata also provides a special %21x format that shows the > exact value in > a special hexadecimal format"). > > I was wondering how I can go from a float (numerical id) to > a compact > string showing the hexadecimal value (perhaps even more > compact than the > %21x format since I only have positive integers). There > might also be > the problem of loss of precision in the conversion, and of > course I need > to avoid that. > > I guess my question boils down to converting a variable > from its value > to a string with its displayed value. The FAQ which Guillermo refers is presumably How do I create individual identifiers numbered from 1 upwards? http://www.stata.com/support/faqs/data/group.html which is by William Gould and myself. One key point not stressed in that FAQ is that it is often useful -- indeed sometimes essential -- to specify -long- for a numeric id. There should be absolutely no problem in holding distinct ids for this number of observations, so long as they are held as integers. I guess that this is the main answer to the underlying problem here. I don't follow precisely what would be gained by what Guillermo is suggesting, as it is difficult to improve on the efficiency of mapping to integers. But you can use %21x as an argument to -string()-, just like any other legal numeric display format. However, it is special and is likely to produce _longer_ strings. -inbase- (Stata 8, undocumented) works on individual numbers only. Alternatively, this is a variant on -base()- in -egenmore- on SSC. e.g. egen hexid = hex(id), where id contains integers _only_. *! 1.0.0 NJC 20 July 2003 program define _ghex version 6.0 gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0 syntax varname(numeric) [if] [in] marksample touse * ignores type passed from -egen- local type "str1" local base = 16 capture assert `varlist' == int(`varlist') if `touse' if _rc { di in r "`varlist' invalid: not integer" exit 459 } capture assert `varlist' >= 0 if `touse' local sign = _rc != 0 quietly { tempvar work digit gen `type' `g' = "" gen long `work' = `varlist' if `touse' gen int `digit' = . su `work', meanonly local max = max(`r(max)',-`r(min)') local power = 0 while `max' >= (`base'^(`power' + 1)) { local power = `power' + 1 } if `sign' { replace `g' = `g' + cond(`work' < 0, "-","+") if `touse' replace `work' = abs(`work') } while `power' >= 0 { replace `digit' = int(`work' / `base'^`power') replace `work' = mod(`work', `base'^`power') replace `g' = `g' + /* */ string(`digit') if `touse' & `digit' <= 9 replace `g' = `g' + /* */ substr("abcdef", `digit' - 9, 1) if `touse' & `digit' >= 10 local power = `power' - 1 } replace `g' = substr(`g',2,.) if substr(`g',1,1) == "0" } end Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

