Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: comparing two .dta files


From   Nick Winter <nwinter@virginia.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: comparing two .dta files
Date   Tue, 12 Feb 2008 15:07:29 -0500

Without considering the larger logic of what you are doing, this line is probably the/a problem:

local mergeby`k'="`mergeby`k'' `var'`k'"

When you use the equal sign, you are telling Stata to evaluate the string that follows; that gets truncated to 244 characters (which is the limit on the length of a string expression; see help limits).

You can simply assign, without evaluation, thusly:

local mergeby`k' "`mergeby`k'' `var'`k'"

or even

local mergeby`k' `mergeby`k'' `var'`k'

In both these cases, everything after the macro name is simply stored in the macro, with a much larger limit on the length (165,200 in Stata/IC, or 1,081,511 in SE or MP; which is enough to allow a local to contain the complete list of variables in a dataset).

Nick Winter

Gabi Huiber wrote:

Dear all,

Maybe I'm going about this the wrong way, but here's what I'm trying to do:

I was recently given a do-file that insheets a bunch of files, works
on them, then produces a .dta file 37 variables wide and several
thousand observations long. Let's call this do-file the prototype
version.

I had to rewrite this do-file to make it more portable (e.g. by using
global macros to define file paths, names, etc.) and more expandable
(e.g. various parameters that are now typed in could be accessed from
a separate .dta file; the length of this file would change as new
values for those parameters were added over time). So I did. Let's
call the new file the production version.

Now I am trying to make sure that the prototype and the production
versions produce the exact same 37-variable .dta file.

I gathered the variable names in a local macro as follows:

unab mergeby: _all

I then sorted and merged the two files by the variables in `mergeby,'
and found that _merge wasn't at all equal to 3 everywhere, so I had to
investigate some more. That's where I hit trouble.

I thought I would do this:

drop if _merge==3
sort _merge `mergeby'
rename _merge j
bysort j: gen id=_n
reshape wide `mergeby', i(id) j(j)

At this point, the file is 37*2+1 variables wide. I would find it
useful to declare two new macros, one with the `mergeby' variables
with the suffix 1, the other with the `mergeby' variables with the
suffix 2.

Since the local `mergeby' has 37 words in it, you would think that
`mergeby1' and `mergeby2' would also have 37 words each, based on the
code below:

local mycount: word count `mergeby'
di `mycount'
forvalues k=1/2 {
local mergeby`k' ""
}
forvalues i=1/`mycount' {
local var: word `i' of `mergeby'
forvalues k=1/2 {
local mergeby`k'="`mergeby`k'' `var'`k'"
}
}
forvalues k=1/2 {
local x: word count `mergeby`k''
di `x'
}

However, they are only 22 words long. I have no idea why.

Are there any word count restrictions when trying to gather local
macros word-by-word like I am trying to do?

Is there some more elegant way to compare two files and isolate any differences?

Thank you,
Gabi
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
--
--------------------------------------------------------------
Nicholas Winter                                 434.924.6994 t
Assistant Professor                             434.924.3359 f
Department of Politics                  nwinter@virginia.edu e
University of Virginia          faculty.virginia.edu/nwinter w
PO Box 400787, 100 Cabell Hall
Charlottesville, VA 22904

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index