Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Comparing two data set
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Comparing two data set
Date
Fri, 4 Mar 2011 13:50:49 +0000
I am not clear about your take-home message here.
In in very broad terms, it seems to me that you should be able to compare two datasets by
1. -append-ing observations, and then checking for match or mismatch (and my bias is to reach for -duplicates-)
or
2. -merge-ing observations, and then checking.
I am not clear that anyone needs both approaches.
Nick
[email protected]
On Behalf Of Dirk Enzmann
In reply to
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1103/date/article-80.html
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1103/date/article-89.html
and
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1103/date/article-97.html
-----------------------------------------------------
What's about using a combination of the official commands -duplicates-,
-merge-, and -foreach- as in the following example?
* ========== START OF EXAMPLE =======================
* Example of comparing two datasets by ID. Run will
* stop if files exist already (remove "//" from
* of "//, replace" to avoid this). Take care that no
* lines of this example syntax are broken!
* --------------------------------------------------
* Modify dataset "bpwide" and save it as "bpwide2":
sysuse bpwide, clear
replace sex=abs(sex-1) if mod(patient,13)==0
replace agegrp=2 if agegrp != 2 & mod(patient,11)==0
replace bp_before=bp_after if patient==100
replace bp_after=145 if patient==100
input
121 1 1 120 119
end
replace patient=2 if _n==1
replace patient=1 if _n==2
save bpwide_2 //, replace
* --------------------------------------------------
* Create log-file:
cap log close
log using example_cf //, replace
* --------------------------------------------------
* Open dataset 1:
sysuse bpwide, clear
* Create a duplicate case:
set obs 121
replace patient=patient[60] if _n==121
replace sex=sex[60] if _n==121
replace agegrp=agegrp[60] if _n==121
replace bp_before=bp_before[60] if _n==121
replace bp_after=bp_after[60] if _n==121
* Describe data:
describe, short
* List duplicates of data 1:
duplicates tag patient sex-bp_after, generate(doubl)
list patient if doubl==1
* drop variable "doubl":
drop doubl
* drop duplicates of data 1:
duplicates drop patient sex-bp_after, force
* Add "_orig" to all varnames except ID (patient):
foreach var of varlist sex-bp_after{
rename `var' `var'_orig
}
* Sort according to (list of) ID variable(s):
gsort patient
* Save dataset for comparison:
save bpwide_orig //, replace
* --------------------------------------------------
* Open dataset 2 and describe:
use bpwide_2, clear
describe, short
* Sort according to (list of) ID variable(s):
gsort patient
* Merge datasets according to ID (patient):
qui merge patient using bpwide_orig
tab _merge
* List problems of non-identical IDs:
list patient _merge if _merge <3, abbreviate(15)
* List duplicates of data 2:
duplicates tag patient sex-bp_after, generate(doubl)
list patient if doubl==1
* List cases and values that differ using values:
foreach var of varlist sex-bp_after{
list patient `var'_orig `var' ///
if `var' != `var'_orig & _merge==3, ///
abbreviate(15) sep(20) noobs nol
}
* --------------------------------------------------
* List cases and values that differ using labels:
foreach var of varlist sex-bp_after{
list patient `var'_orig `var' ///
if `var' != `var'_orig & _merge==3, ///
abbreviate(15) sep(20) noobs
}
log close
* ============ END OF EXAMPLE =======================
Maybe it's a bit clumsy, but it should work.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/