Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identifier values change after Merge

From   Phil Clayton <>
To   <>
Subject   Re: st: Identifier values change after Merge
Date   Thu, 11 Nov 2010 11:42:35 +1030

That's strange. Are you sure it's not just that the display format for the 
identifier has changed? You could check this using: 
format %11.0g tractno 
If the merge is indeed changing the data, a quick and dirty solution would be to 
create a copy of the identifier using -clonevar- before performing the merge. 
Then you can use the new variable as the identifier. 
It may also be worth looking at the user-written command -mmerge- which I find to 
be a more user-friendly command for merging datasets. 
On Thu 11/11/10  8:42 AM , Anjanette Chan Tack sent: 
> Hi 
> I am using intercooled stata 9.1 to do a 1 to 1 merge using an 11 digit 
> long identifier that uniquely designates a census tract. As background, I 
> got the census tract data from the geolytics Neighborhood Change database, 
> and these 11 digit numbers are the unique identifiers that come with them. 
> The identifier is being stored as double. In executing the merge, I ask 
> stata to keep the matched observations only and drop the unmatched 
> observations. Since the master file's list of identifiers is a subset of 
> the using file, I was hoping that it would allow me to extract this subset 
> of observations and their attendant information easily. To do so, I use 
> this command: 
> merge 1:1 tractno using C:\Program Files\Stata9\Filename assert (match, 
> master) keep (match) 
> In some ways the merge proceeds well. The resulting list of N observations 
> is the N I expect. The problem is that after the merge, the value of the 
> identifiers change. Where previously, census tracts had unique 11 digit 
> identifiers like, these idenifiers are all rounded to the same number in 
> the new merged dataset. 
> Thus I have a BEFORE and AFTER that look like this: 
> Before: 
> 17031020500 
> 17031020600 
> 17031020700 
> 17031130100 
> 17031090100 
> 17031090200 
> After 
> 1.70E+10 
> 1.70E+10 
> 1.70E+10 
> 1.70E+10 
> 1.70E+10 
> 1.70E+10 
> Where 1.70E+10  = 17030000000 in all cases.  
> I thought that this might be due to the way that stata is storing the 
> information, so I googled "help stata is approximating numeric 
> values". I found an archived response to a problem that seems similar 
> here: 
> The help answer says that the double storage type can sustain up to 15 
> digits. Since my identifier is only 11 digits long, I can't understand what 
> the problem might be. 
> I am quite unfamiliar with stata (it's the first time I'm using it in 3 
> years, and the first time outside a classroom setting for basic trainign in 
> statistics), so I would be grateful for any suggestions and advice. 
> Many thanks in advance! 
> Anjie. 
> ------------------------------- 
> Anjanette M. Chan Tack 
> PhD student  
> University of Chicago Department of Sociology 
> * 
> *   For searches and help try: 
> *** 

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index