Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: AW: Merging database


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: AW: Merging database
Date   Wed, 29 Apr 2009 18:03:38 +0100

I have various comments on this code. 

1. -foreach x in V*- won't work. Eric is probably thinking of 
-foreach x of var V*- but in this case -foreach v in V1 V2 V3- takes no more thought. 

2. Eric wants to -recode- missings and also concatenate the identifiers. If so, it is easier to go 

egen V_combined = concat(V1 V2 V3), p(_) 
replace V_combined = subinstr(V_combined, ".", "x", .) 

That does what Eric's lines 

foreach x in V* {
	recode `x' (.=99)  // <-- So that -regexr- isn't tripped up later
	tostring `x', replace
}
gen str10 v_combined = V1+"_"+V2+"_"+V3
gen v_combined2 = regexr(v_combined, "99", "x")

seem intended to do. 

Note that using 99, even temporarily, is dangerous unless one can be sure that 99 is not a legitimate identifier. In any case, why recode? A variable with values like "1 . ." is a satisfactory composite -- if that is what is needed. 

Nick 
n.j.cox@durham.ac.uk 

Eric A. Booth

To add to Jochen's comment:

If you were hoping to have a new 'ID' variable that keeps the  
information from all the ID variables V1, V2, and V3, you could create  
a string variable...here are some examples:

******************

clear
input V1  V2  V3
1     .       1
2     .       2
3    3       3
4    4        .   .     5       5
6     .       6
end
//
foreach x in V* {
	recode `x' (.=99)  // <-- So that -regexr- isn't tripped up later
	tostring `x', replace
	}
	gen str10 v_combined = V1+"_"+V2+"_"+V3
	gen v_combined2 = regexr(v_combined, "99", "x")
	sencode v_combined2, gene(uniqueID) gsort(+v_combined2) label(id)	
list


On Apr 29, 2009, at 11:36 AM, Jochen Späth wrote:

> Hello Sergio,
>
> I'm not quite sure of what your problem is, maybe it would help if  
> you were a little more precise.
>
> Below, I assumed that the example you gave is AFTER your three data  
> sets have been merged, with v1 coming from the first, v2 from the  
> second and v3 from the third and with v1, v2 and v3 denoting all the  
> same ID. If this is the case you could
>
> -replace v1 = v2 if v1 == . & v2 != .-
> -replace v1 = v3 if v1 == . & v2 ==. & v3 != .-
> -count if v1 == .- /* should return 0, otherwise there are  
> observations in your data that are not uniquely determined by either  
> of your three ID variables.*/
> -drop v2 v3- /* of course, only if you got all IDs caught in v1 */
>
> HTH,
> Jochen
>
> -----Ursprüngliche Nachricht-----
> Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu 
> ] Im Auftrag von "SERGIO M. AFCHA CHÁVEZ"
> Gesendet: Mittwoch, 29. April 2009 17:55
> An: statalist@hsphsun2.harvard.edu
> Betreff: st: Merging database
>
> Dear Statlisters,
>
> I have a little problem merging a data base. I have variables for 3
> years showing an ID:
>
>
> V1  V2  V3
> 1     .       1
> 2     .       2
> 3    3       3
> 4    4        .
> .     5       5
> 6     .       6
>
>
> I need only one ID variable. How can I obtain one column with  all the
> ID numbers?
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index