[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: Merging database

From	"Eric A. Booth" <[email protected]>
To	[email protected]
Subject	Re: st: AW: Merging database
Date	Wed, 29 Apr 2009 13:14:00 -0500


On Apr 29, 2009, at 12:03 PM, Nick Cox wrote:

I have various comments on this code.

1. -foreach x in V*- won't work. Eric is probably thinking of
-foreach x of var V*- but in this case -foreach v in V1 V2 V3- takesno more thought.

-foreach x in V*- works for fine for me. It loops over all thevariables V1-V3.

2. Eric wants to -recode- missings and also concatenate theidentifiers. If so, it is easier to go
egen V_combined = concat(V1 V2 V3), p(_)
replace V_combined = subinstr(V_combined, ".", "x", .)

...

Note that using 99, even temporarily, is dangerous unless one can besure that 99 is not a legitimate identifier. In any case, whyrecode? A variable with values like "1 . ." is a satisfactorycomposite -- if that is what is needed.


Thanks, Nick...this is very helpful.

I wasn't aware of subinstr() and I couldn't get regexr() to work withthe missing value "." (hence the use of 99...but you're right aboutthat too, using 99 is dangerous, and in the past I've used a muchlarger number that was clearly out of the range of the dataset, butusing substr() without the recode is a far better approach).




EAB

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
Fax: +979.845.0249


Eric A. Booth

To add to Jochen's comment:

If you were hoping to have a new 'ID' variable that keeps the
information from all the ID variables V1, V2, and V3, you could create
a string variable...here are some examples:

******************

clear
input V1  V2  V3
1     .       1
2     .       2
3    3       3
4    4        .   .     5       5
6     .       6
end
//
foreach x in V* {
	recode `x' (.=99)  // <-- So that -regexr- isn't tripped up later
	tostring `x', replace
	}
	gen str10 v_combined = V1+"_"+V2+"_"+V3
	gen v_combined2 = regexr(v_combined, "99", "x")
	sencode v_combined2, gene(uniqueID) gsort(+v_combined2) label(id)	
list


On Apr 29, 2009, at 11:36 AM, Jochen Späth wrote:

Hello Sergio,

I'm not quite sure of what your problem is, maybe it would help if
you were a little more precise.

Below, I assumed that the example you gave is AFTER your three data
sets have been merged, with v1 coming from the first, v2 from the
second and v3 from the third and with v1, v2 and v3 denoting all the
same ID. If this is the case you could

-replace v1 = v2 if v1 == . & v2 != .-
-replace v1 = v3 if v1 == . & v2 ==. & v3 != .-
-count if v1 == .- /* should return 0, otherwise there are
observations in your data that are not uniquely determined by either
of your three ID variables.*/
-drop v2 v3- /* of course, only if you got all IDs caught in v1 */

HTH,
Jochen

-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]
] Im Auftrag von "SERGIO M. AFCHA CHÁVEZ"
Gesendet: Mittwoch, 29. April 2009 17:55
An: [email protected]
Betreff: st: Merging database

Dear Statlisters,

I have a little problem merging a data base. I have variables for 3
years showing an ID:


V1  V2  V3
1     .       1
2     .       2
3    3       3
4    4        .
.     5       5
6     .       6

I need only one ID variable. How can I obtain one column with allthe

ID numbers?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: AW: Merging database
  - From: "Nick Cox" <[email protected]>

References:
- st: Merging database
  - From: "SERGIO M. AFCHA CHÁVEZ" <[email protected]>
- st: AW: Merging database
  - From: Jochen Späth <[email protected]>
- Re: st: AW: Merging database
  - From: "Eric A. Booth" <[email protected]>
- RE: st: AW: Merging database
  - From: "Nick Cox" <[email protected]>

Prev by Date: Re: st: RE: svy:tabulate: what to report - design-based test or the Uncorrected chi-square
Next by Date: RE: st: AW: Merging database
Previous by thread: RE: st: AW: Merging database
Next by thread: RE: st: AW: Merging database
Index(es):
- Date
- Thread