Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: comparing two .dta files


From   "Gabi Huiber" <[email protected]>
To   [email protected]
Subject   st: comparing two .dta files
Date   Tue, 12 Feb 2008 14:53:01 -0500

Dear all,

Maybe I'm going about this the wrong way, but here's what I'm trying to do:

I was recently given a do-file that insheets a bunch of files, works
on them, then produces a .dta file 37 variables wide and several
thousand observations long. Let's call this do-file the prototype
version.

I had to rewrite this do-file to make it more portable (e.g. by using
global macros to define file paths, names, etc.) and more expandable
(e.g. various parameters that are now typed in could be accessed from
a separate .dta file; the length of this file would change as new
values for those parameters were added over time). So I did. Let's
call the new file the production version.

Now I am trying to make sure that the prototype and the production
versions produce the exact same 37-variable .dta file.

I gathered the variable names in a local macro as follows:

unab mergeby: _all

I then sorted and merged the two files by the variables in `mergeby,'
and found that _merge wasn't at all equal to 3 everywhere, so I had to
investigate some more. That's where I hit trouble.

I thought I would do this:

drop if _merge==3
sort _merge `mergeby'
rename _merge j
bysort j: gen id=_n
reshape wide `mergeby', i(id) j(j)

At this point, the file is 37*2+1 variables wide. I would find it
useful to declare two new macros, one with the `mergeby' variables
with the suffix 1, the other with the `mergeby' variables with the
suffix 2.

Since the local `mergeby' has 37 words in it, you would think that
`mergeby1' and `mergeby2' would also have 37 words each, based on the
code below:

local mycount: word count `mergeby'
di `mycount'
forvalues k=1/2 {
local mergeby`k' ""
}
forvalues i=1/`mycount' {
local var: word `i' of `mergeby'
forvalues k=1/2 {
local mergeby`k'="`mergeby`k'' `var'`k'"
}
}
forvalues k=1/2 {
local x: word count `mergeby`k''
di `x'
}

However, they are only 22 words long. I have no idea why.

Are there any word count restrictions when trying to gather local
macros word-by-word like I am trying to do?

Is there some more elegant way to compare two files and isolate any differences?

Thank you,
Gabi
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index