Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: weirdness in merging


From   "Huiber Gabi (nat1gxh)" <[email protected]>
To   [email protected]
Subject   st: weirdness in merging
Date   Thu, 27 Feb 2003 12:22:56 -0500

Hello all,

My problem should be simple. I have a file A, and two temporary files "`B'"
and "`C'".

I have two variables in A, call them a and b, by which I am merging file A
first with "`B'", then with "`C'". I do so because "`B'" and "`C'" consist
of four variables each: a, b, c and d. I want to get c and d into A. Files
"`B'" and "`C'" match separate subsets of the values of a and b, but the
matches add up exactly to the full set A, so at the end of these two simple
merge operations, A should be wider by two non-missing variables: c and d.

It doesn't work, and I don't see why. My code goes like this:

use A

preserve
 (do stuff)

real screen shot follows, but first, some definitions:
variables a and b by which I am merging are emp_nr and mo_dt
variables c and d that I want are job_cls_cd and job_cls_stt_dt
tempfile "`B'" is "`hurts'"
tempfile "`C'" is "`classes_x'"

. restore

. 
. drop job_cls_cd job_cls_stt_dt

. merge emp_nr mo_dt using "`hurts'"

. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     746397       97.64       97.64
          3 |      18060        2.36      100.00
------------+-----------------------------------
      Total |     764457      100.00

. 
. preserve

. keep if _merge==1
(18060 observations deleted)

. drop _merge

. keep emp_nr mo_dt

. egen u=tag(emp_nr mo_dt)

. keep if u==1
(97582 observations deleted)

. drop u

. sort emp_nr mo_dt

. tempfile classes_x

. merge emp_nr mo_dt using "`classes'"
mo_dt was byte now float

. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          2 |    2184339       77.10       77.10
          3 |     648815       22.90      100.00
------------+-----------------------------------
      Total |    2833154      100.00

. keep if _merge==3
(2184339 observations deleted)

. drop _merge

. sort emp_nr mo_dt

. save "`classes_x'", replace
(note: file D:\TEMP\ST_0p003x.tmp not found)
file D:\TEMP\ST_0p003x.tmp saved

. describe

Contains data from D:\TEMP\ST_0p003x.tmp
  obs:       648,815                          
 vars:             4                          27 Feb 2003 11:44
 size:    11,678,670 (92.0% of memory free)
----------------------------------------------------------------------------
---
              storage  display     value
variable name   type   format      label      variable label
----------------------------------------------------------------------------
---
emp_nr          long   %12.0g                 EMP_NR
mo_dt           float  %9.0g                  
job_cls_cd      str4   %9s                    JOB_CLS_CD
job_cls_stt_dt  int    %9.0g                  
----------------------------------------------------------------------------
---
Sorted by:  emp_nr  mo_dt  

. count if job_cls_stt_dt==.
    0

. count if job_cls_cd==""
    0

*** Notice how this tempfile has all the stuff I want. Back to screen shot:

. restore

. 
. drop _merge

. sort emp_nr mo_dt

. merge emp_nr mo_dt using "`classes_x'"
mo_dt was byte now float

. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      18060        2.36        2.36
          3 |     746397       97.64      100.00
------------+-----------------------------------
      Total |     764457      100.00

. count if job_cls_stt_dt==.
746397

. count if job_cls_cd==""
746397

Does anybody have an idea why my _merge=3 matches won't produce the
job_cls_stt_dt and the job_cls_cd that are clearly present in the using
dataset?

Thanks,
Gabi
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index