Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: AW: problem with destring


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: AW: problem with destring
Date   Wed, 23 Sep 2009 17:18:16 +0100

First off, Lindsay's obvious need is to concatenate string variables. That can and should be done directly using + or -egen, concat()-. There is never any need to convert to numeric and then do the equivalent arithmetic. 

gen hhidpn = hhid + pn 

should suffice, and if not, there will be small work-arounds. (If Lindsay absolutely needs a numeric version of that result, then -destring- is the way to go, subject to the point below.)  

Second, as Martin has hinted, the problem is not in -destring-, an excellent command, but in the way it is being used. 

Lindsay insisted on -float- results, but needs extra precision to carry each digit precisely. Don't use -float- therefore. 

Third, it seems that the -float- option could be documented in more detail to warn of this problem (or undocumented so that only more experienced users discover it!). Otherwise users could be punished, like Lindsay, in being given what they ask for. 

Nick 
[email protected] 

Martin Weiss

Also note http://www.ats.ucla.edu/stat/stata/faq/longid.htm
and
http://www.stata.com/support/faqs/data/prec.html

Lindsay

I am using Stata/SE 10.1 and having problems executing what should be
a really simple operation.  My dataset has household id (hhid) and
person number (pn) variables in string format.  I need to combine them
into one numeric unique identifer of the form hhidpn = hhid*1000+pn in
order to merge them with other data.  After I destring the original
IDs (which appears to work fine) and perform this operation, some of
the identifiers are duplicates.  It looks like Stata is somehow adding
some of the numbers incorrectly (mostly they are +/-1 from what they
should be).  I have copied some of the output below.

I've also tried adding the two string variables first and then
destringing and I get the same problem.  The string variable with both
IDs combined looks right, but after I destring some are wrong.  Any
suggestions what might be going on?

Thanks, Lindsay

/**** FIRST METHOD (DESTRING THEN ADD) ****/
. use "${dir}Geographic Identifiers\RGEO.dta", clear;

. destring HHID, gen(hhid) float;
HHID has all characters numeric; hhid generated as float

. destring PN, gen(pn) float;
PN has all characters numeric; pn generated as byte

. replace hhid = hhid*1000;
(30712 real changes made)

. format hhid %9.0f;

. gen hhidpn = hhid + pn;

. format hhidpn %9.0f;

. sort hhidpn

. list  HHID hhid PN pn hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]

+-------------------------------------------+
               HHID        hhid    PN   pn      hhidpn
-------------------------------------------
1501.  016973    16973000   031   31    16973032
1502.  016973    16973000   032   32    16973032
1641.  017530    17530000   040   40    17530040
1642.  017530    17530000   041   41    17530040
1661.  017641    17641000   011   11    17641012
-------------------------------------------
1662.  017641    17641000   012   12    17641012
1666.  017646    17646000   040   40    17646040
1667.  017646    17646000   041   41    17646040
1679.  017707    17707000   040   40    17707040
1680.  017707    17707000   041   41    17707040
-------------------------------------------
1832.  018435    18435000   040   40    18435040
1833.  018435    18435000   041   41    18435040
1849.  018482    18482000   040   40    18482040
1850.  018482    18482000   041   41    18482040
1854.  018494    18494000   020   20    18494020
....

/**** SECOND METHOD (ADD THEN DESTRING) ****/
. use "${dir}Geographic Identifiers\RGEO.dta", clear;

. gen HHIDPN = HHID + PN;

. destring HHIDPN, gen(hhidpn) float;
HHIDPN has all characters numeric; hhidpn generated as float

. format hhidpn %9.0f;

. sort hhidpn

. list  HHID PN HHIDPN hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]

+--------------------------------------+
             HHID    PN      HHIDPN      hhidpn
--------------------------------------
1501.  016973   031   016973031    16973032
1502.  016973   032   016973032    16973032
1641.  017530   040   017530040    17530040
1642.  017530   041   017530041    17530040
1661.  017641   011   017641011    17641012
--------------------------------------
1662.  017641   012   017641012    17641012
1666.  017646   040   017646040    17646040
1667.  017646   041   017646041    17646040
1679.  017707   040   017707040    17707040
1680.  017707   041   017707041    17707040
--------------------------------------
1832.  018435   040   018435040    18435040
1833.  018435   041   018435041    18435040
1849.  018482   040   018482040    18482040
1850.  018482   041   018482041    18482040
1854.  018494   020   018494020    18494020
....


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index