Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Conversion to Hexadecimal


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Conversion to Hexadecimal
Date   Sun, 20 Jul 2003 18:31:30 +0100

Cruces,GA
>
> Working with large datasets, I've found a problem regarding
> observations
> id: my originals are way too long (say, strings 20 of the form
> "PROVINCE-CITY-HOUSEHOLD..."). The id variable only is
> sometimes half of
> my file. Generating numerical ids (as explained in a very
> useful FAQ by
> N. Cox) is useful, but then I sometimes have problems with
> the rounding
> of numbers (since I have ids from 1 to, say, 16 millions).
>
> I thought about a solution which uses strings but is more
> compact than
> my original, which is storing numerical ids as strings in
> hexadecimal
> notation. I've found a discussion by W. Gould on this list, but this
> referred basically as hex as a form of displaying numbers
> (from a FAQ:
> "Stata also provides a special %21x format that shows the
> exact value in
> a special hexadecimal format").
>
> I was wondering how I can go from a float (numerical id) to
> a compact
> string showing the hexadecimal value (perhaps even more
> compact than the
> %21x format since I only have positive integers). There
> might also be
> the problem of loss of precision in the conversion, and of
> course I need
> to avoid that.
>
> I guess my question boils down to converting a variable
> from its value
> to a string with its displayed value.

The FAQ which Guillermo refers is presumably

How do I create individual identifiers numbered from 1 upwards?
http://www.stata.com/support/faqs/data/group.html

which is by William Gould and myself. One key point not
stressed in that FAQ is that it is often useful -- indeed
sometimes essential -- to specify -long- for a numeric id. There
should be absolutely no problem in holding distinct
ids for this number of observations, so long as they
are held as integers. I guess that this is the main
answer to the underlying problem here.

I don't follow precisely what would be gained by what Guillermo
is suggesting, as it is difficult to improve on the efficiency
of mapping to integers. But you can use %21x as an argument to
-string()-, just like any other legal numeric display format.
However, it is special and is likely to produce _longer_ strings.

-inbase- (Stata 8, undocumented) works on individual numbers
only.

Alternatively, this is a variant on -base()- in -egenmore-
on SSC. e.g. egen hexid = hex(id), where id contains
integers _only_.

*! 1.0.0 NJC 20 July 2003
program define _ghex
	version 6.0

	gettoken type 0 : 0
	gettoken g    0 : 0
	gettoken eqs  0 : 0

	syntax varname(numeric) [if] [in]

	marksample touse
	* ignores type passed from -egen-
	local type "str1"
	local base = 16

	capture assert `varlist' == int(`varlist') if `touse'
	if _rc {
		di in r "`varlist' invalid: not integer"
		exit 459
	}
	capture assert `varlist' >= 0 if `touse'
	local sign = _rc != 0

	quietly {
		tempvar work digit
		gen `type' `g' = ""
		gen long `work' = `varlist' if `touse'
		gen int `digit' = .
		su `work', meanonly
		local max = max(`r(max)',-`r(min)')
		local power = 0
		while `max' >= (`base'^(`power' + 1)) {
			local power = `power' + 1
		}
		if `sign' {
			replace `g' = `g' + cond(`work' < 0, "-","+") if `touse'
			replace `work' = abs(`work')
		}
		while `power' >= 0 {
			replace `digit' = int(`work' / `base'^`power')
			replace `work' = mod(`work', `base'^`power')
			replace `g' = `g' + /*
		*/ string(`digit') if `touse' & `digit' <= 9
			replace `g' = `g' + /*
		*/ substr("abcdef", `digit' - 9, 1) if `touse' & `digit' >= 10
			local power = `power' - 1
		}
		replace `g' = substr(`g',2,.) if substr(`g',1,1) == "0"
	}
end


Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index