Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Comparing two data set


From   Dirk Enzmann <[email protected]>
To   [email protected]
Subject   Re: st: Comparing two data set
Date   Fri, 04 Mar 2011 14:30:12 +0100

In reply to

http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1103/date/article-80.html

http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1103/date/article-89.html

and

http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1103/date/article-97.html

-----------------------------------------------------
What's about using a combination of the official commands -duplicates-, -merge-, and -foreach- as in the following example?

* ========== START OF EXAMPLE =======================
* Example of comparing two datasets by ID. Run will
* stop if files exist already (remove "//" from
* of "//, replace" to avoid this). Take care that no
* lines of this example syntax are broken!

* --------------------------------------------------
* Modify dataset "bpwide" and save it as "bpwide2":

sysuse bpwide, clear

replace sex=abs(sex-1) if mod(patient,13)==0
replace agegrp=2 if agegrp != 2 & mod(patient,11)==0
replace bp_before=bp_after if patient==100
replace bp_after=145 if patient==100
input
121 1 1 120 119
end
replace patient=2 if _n==1
replace patient=1 if _n==2

save bpwide_2 //, replace

* --------------------------------------------------
* Create log-file:
cap log close
log using example_cf //, replace

* --------------------------------------------------
* Open dataset 1:
sysuse bpwide, clear

* Create a duplicate case:
set obs 121
replace patient=patient[60] if _n==121
replace sex=sex[60] if _n==121
replace agegrp=agegrp[60] if _n==121
replace bp_before=bp_before[60] if _n==121
replace bp_after=bp_after[60] if _n==121

* Describe data:
describe, short

* List duplicates of data 1:
duplicates tag patient sex-bp_after, generate(doubl)
list patient if doubl==1
* drop variable "doubl":
drop doubl
* drop duplicates of data 1:
duplicates drop patient sex-bp_after, force

* Add "_orig" to all varnames except ID (patient):
foreach var of varlist sex-bp_after{
  rename `var' `var'_orig
}

* Sort according to (list of) ID variable(s):
gsort patient

* Save dataset for comparison:
save bpwide_orig //, replace

* --------------------------------------------------
* Open dataset 2 and describe:
use bpwide_2, clear
describe, short

* Sort according to (list of) ID variable(s):
gsort patient

* Merge datasets according to ID (patient):
qui merge patient using bpwide_orig
tab _merge

* List problems of non-identical IDs:
list patient _merge if _merge <3, abbreviate(15)

* List duplicates of data 2:
duplicates tag patient sex-bp_after, generate(doubl)
list patient if doubl==1

* List cases and values that differ using values:
foreach var of varlist sex-bp_after{
  list patient `var'_orig `var'     ///
       if `var' != `var'_orig & _merge==3, ///
	   abbreviate(15) sep(20) noobs nol
}
* --------------------------------------------------
* List cases and values that differ using labels:
foreach var of varlist sex-bp_after{
  list patient `var'_orig `var'     ///
       if `var' != `var'_orig & _merge==3, ///
	   abbreviate(15) sep(20) noobs
}

log close
* ============ END OF EXAMPLE =======================

Maybe it's a bit clumsy, but it should work.

Dirk
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index