Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: re: Comparing variable values with a predefined list in other dataset


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   stata list <statalist@hsphsun2.harvard.edu>
Subject   st: re: Comparing variable values with a predefined list in other dataset
Date   Sat, 3 Oct 2009 01:33:50 -0700 (PDT)

A repost with a more intelligible subject heading: 

--- On Sat, 3/10/09, Andres Gonzalez Rangel wrote:
> I think merge is not a solution for my problem, because I
> need to keep both bases separated.  Datasets with
> diagnostic or procedure codes are for reference, while the
> working dataset contains the values which need to be
> compared.  Is more like a validation process, checking
> if a variable's value corresponds to any value in the other
> dataset.

I think -merge- is a solution to your propblem. It is not a 
problem if the same code happens with multiple observations; 
all observations will continue to exist after merging your 
dataset, you only create one extra observations for every 
code that is in the valid code list, but not in the data. 
These extra cases are easily identified and dropped using the
_merge variable that -merge- creates. After that you have 
your original dataset back, together with a new variable 
_merge indicating whether the merge was successful or not. If 
the merge was successful that means that the code in your data 
corresponded with a code in your list of valid codes. If the 
merge was unsuccessful that your code in the data did not 
conform to any code in your list of valid codes. So extending 
the example I gave earlier:

*------------- begin example ------------
// this is just creating some datasets to illustrate the example
tempfile data list
clear
input code other_var
      1    1
      3    0
      4    1
end
save `data'

clear
input code
      1
      2
      3
end
save `list'

// the real example begins:
use `list', clear
sort code
save `list', replace
use `data', clear
sort code
merge code using `list'

// drop cases that where only in the list of valid codes
// so the dataset contains only cases from your data
drop if _merge == 2

// create the variable _valid
gen byte _valid = _merge == 3
drop _merge

list
*---------- end example --------------
(For more on examples I sent to the Statalist see: 
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------



      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index