Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

AW: st: "Wrong" result with encode / merge ?


From   "Thomas Erdmann" <tom.erdmann@stud.unibas.ch>
To   <statalist@hsphsun2.harvard.edu>
Subject   AW: st: "Wrong" result with encode / merge ?
Date   Thu, 23 Nov 2006 14:17:47 +0100

Philipp,

thanks for your explanation.

I think I overestimated what -encode- is actually doing, the explanation
under help -encode- is a bit short I find.

- Tom





-----Urspr=FCngliche Nachricht-----
Von: statalist-owner@hsphsun2.harvard.edu
[mailto:statalist-owner@hsphsun2.harvard.edu] Im Auftrag von Philipp Rehm
Gesendet: Donnerstag, 23. November 2006 13:48
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: "Wrong" result with encode / merge ?

Assume you have these id7temp codes in your master data-set:
AT18679
AT18680
AT18681

Assume you have these id7temp codes in your using data-set:
AT18679
AT18681
AT18682

If you -encode- the id7temp variable in each data-set, you will get the
following id7 variables:

In the master data-set:
1=3DAT18679
2=3DAT18680
3=3DAT18681

In the using data-set:
1=3DAT18679
2=3DAT18681
3=3DAT18682

That is, encodes simply assigns numbers to each string.

If you -merge- the using data-set with the master data-set on id7, you
will get the following (wrong) match:
1 "AT18679" =3D 1 "AT18679"
2 "AT18680" =3D 2 "AT18681"
3 "AT18681" =3D 3 "AT18682"

That's not what you want. I don't think that you should -encode- your
values. Simply -merge- on id7temp, not id7.

HTH,
Philipp


Thomas Erdmann wrote:
> Hi,
>
> I have a dataset with ids that look like: AT18679U (two strings followed
by
> 5 numbers, optionally followed by another string)
>
> Between the two datasets I would like to merge only the first 7 digits ar=
e
> equal, therefore I generated
>
> generate id7temp=3Dsubstr(id,1,7)
> encode id7temp, gen(id7)
> sort id7
>
> and merged the two datasets by id7. When I quality checked the results
there
> were several mismatches, which don't seem to happen if I use the string i=
d
> and not the encoded one. Why is that?
>
> Thanks in advance
> -Tom
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


--=20
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.11/543 - Release Date: 20.11.2006



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index