Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: problem with destring


From   Tim Wade <wadetj@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: problem with destring
Date   Fri, 25 Sep 2009 07:01:42 -0400

Lindsay,
try:

gen double hhidpn = hhid + pn

format hhidpn %15.0f

Tim



On Wed, Sep 23, 2009 at 10:00 AM, Lindsay <lindsaystata@gmail.com> wrote:
> I am using Stata/SE 10.1 and having problems executing what should be
> a really simple operation.  My dataset has household id (hhid) and
> person number (pn) variables in string format.  I need to combine them
> into one numeric unique identifer of the form hhidpn = hhid*1000+pn in
> order to merge them with other data.  After I destring the original
> IDs (which appears to work fine) and perform this operation, some of
> the identifiers are duplicates.  It looks like Stata is somehow adding
> some of the numbers incorrectly (mostly they are +/-1 from what they
> should be).  I have copied some of the output below.
>
> I've also tried adding the two string variables first and then
> destringing and I get the same problem.  The string variable with both
> IDs combined looks right, but after I destring some are wrong.  Any
> suggestions what might be going on?
>
> Thanks, Lindsay
>
> /**** FIRST METHOD (DESTRING THEN ADD) ****/
> . use "${dir}Geographic Identifiers\RGEO.dta", clear;
>
> . destring HHID, gen(hhid) float;
> HHID has all characters numeric; hhid generated as float
>
> . destring PN, gen(pn) float;
> PN has all characters numeric; pn generated as byte
>
> . replace hhid = hhid*1000;
> (30712 real changes made)
>
> . format hhid %9.0f;
>
> . gen hhidpn = hhid + pn;
>
> . format hhidpn %9.0f;
>
> . sort hhidpn
>
> . list  HHID hhid PN pn hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
>
> +-------------------------------------------+
>                HHID        hhid    PN   pn      hhidpn
> -------------------------------------------
> 1501.  016973    16973000   031   31    16973032
> 1502.  016973    16973000   032   32    16973032
> 1641.  017530    17530000   040   40    17530040
> 1642.  017530    17530000   041   41    17530040
> 1661.  017641    17641000   011   11    17641012
> -------------------------------------------
> 1662.  017641    17641000   012   12    17641012
> 1666.  017646    17646000   040   40    17646040
> 1667.  017646    17646000   041   41    17646040
> 1679.  017707    17707000   040   40    17707040
> 1680.  017707    17707000   041   41    17707040
> -------------------------------------------
> 1832.  018435    18435000   040   40    18435040
> 1833.  018435    18435000   041   41    18435040
> 1849.  018482    18482000   040   40    18482040
> 1850.  018482    18482000   041   41    18482040
> 1854.  018494    18494000   020   20    18494020
> ....
>
> /**** SECOND METHOD (ADD THEN DESTRING) ****/
> . use "${dir}Geographic Identifiers\RGEO.dta", clear;
>
> . gen HHIDPN = HHID + PN;
>
> . destring HHIDPN, gen(hhidpn) float;
> HHIDPN has all characters numeric; hhidpn generated as float
>
> . format hhidpn %9.0f;
>
> . sort hhidpn
>
> . list  HHID PN HHIDPN hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
>
> +--------------------------------------+
>              HHID    PN      HHIDPN      hhidpn
> --------------------------------------
> 1501.  016973   031   016973031    16973032
> 1502.  016973   032   016973032    16973032
> 1641.  017530   040   017530040    17530040
> 1642.  017530   041   017530041    17530040
> 1661.  017641   011   017641011    17641012
> --------------------------------------
> 1662.  017641   012   017641012    17641012
> 1666.  017646   040   017646040    17646040
> 1667.  017646   041   017646041    17646040
> 1679.  017707   040   017707040    17707040
> 1680.  017707   041   017707041    17707040
> --------------------------------------
> 1832.  018435   040   018435040    18435040
> 1833.  018435   041   018435041    18435040
> 1849.  018482   040   018482040    18482040
> 1850.  018482   041   018482041    18482040
> 1854.  018494   020   018494020    18494020
> ....
>
>
>
> --
> Lindsay Sabik
> Doctoral Candidate in Health Policy
> Harvard University
> lsabik@fas.harvard.edu
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index