Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Identifier values change after Merge


From   Anjanette Chan Tack <[email protected]>
To   [email protected]
Subject   st: Re: Identifier values change after Merge
Date   Tue, 16 Nov 2010 11:48:47 -0600 (CST)

Just wanted to say thanks to Phil Clayton for responding to my query. The problem was in fact the display format for the identifier. -- Many thanks!
-------------------------------
Anjanette M. Chan Tack
PhD student 
University of Chicago Department of Sociology


---- Original message ----
>Date: Wed, 10 Nov 2010 16:12:27 -0600 (CST)
>From: Anjanette Chan Tack <[email protected]>  
>Subject: Identifier values change after Merge  
>To: [email protected]
>
>Hi
>
>
>I am using intercooled stata 9.1 to do a 1 to 1 merge using an 11 digit long identifier that uniquely designates a census tract. As background, I got the census tract data from the geolytics Neighborhood Change database, and these 11 digit numbers are the unique identifiers that come with them. 
>
>The identifier is being stored as double. In executing the merge, I ask stata to keep the matched observations only and drop the unmatched observations. Since the master file's list of identifiers is a subset of the using file, I was hoping that it would allow me to extract this subset of observations and their attendant information easily. To do so, I use this command:
>
>
>merge 1:1 tractno using C:\Program Files\Stata9\Filename assert (match, master) keep (match)
>
>In some ways the merge proceeds well. The resulting list of N observations is the N I expect. The problem is that after the merge, the value of the identifiers change. Where previously, census tracts had unique 11 digit identifiers like, these idenifiers are all rounded to the same number in the new merged dataset.
>
>
>Thus I have a BEFORE and AFTER that look like this:
>
>Before:
>
>17031020500
>17031020600
>17031020700
>17031130100
>17031090100
>17031090200
>
>After
>1.70E+10
>1.70E+10
>1.70E+10
>1.70E+10
>1.70E+10
>1.70E+10
>
>Where 1.70E+10  = 17030000000 in all cases. 
>
>I thought that this might be due to the way that stata is storing the information, so I googled "help stata is approximating numeric values". I found an archived response to a problem that seems similar here: http://www.stata.com/statalist/archive/2010-06/msg01017.html
>
>The help answer says that the double storage type can sustain up to 15 digits. Since my identifier is only 11 digits long, I can't understand what the problem might be.
>
>I am quite unfamiliar with stata (it's the first time I'm using it in 3 years, and the first time outside a classroom setting for basic trainign in statistics), so I would be grateful for any suggestions and advice.
>
>Many thanks in advance!
>
>Anjie.
>-------------------------------
>Anjanette M. Chan Tack
>PhD student 
>University of Chicago Department of Sociology
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index