Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: Comparing datasets


From   "Eva Poen" <eva.poen@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: Comparing datasets
Date   Mon, 15 Sep 2008 22:41:04 +0100

Considering that Raphael has ~300 variables, -cf _all- without the
verbose option would still give him ~300 lines of unwanted output (in
the style of "varx : 315890 mismatches") ; that's what I meant by
clutter.

Eva


2008/9/15 Martin Weiss <mw.stlist@googlemail.com>:
> -cf- w/o options is quite parsimonious in its output, actually. It is only
> when you add options -verbose -and - all- that output gets messy...
>
> Martin Weiss
> _______________________
> ----- Original Message ----- From: "Eva Poen" <eva.poen@gmail.com>
> To: <statalist@hsphsun2.harvard.edu>
> Sent: Monday, September 15, 2008 11:18 PM
> Subject: Re: st: Re: Comparing datasets
>
>
>> If you use -cf- you have to use it in both directions, to be on the
>> safe side. However, -cf- compares all values as well, which will
>> clutter the output considerably if these are actually two different
>> datasets with the same variable names.
>>
>> Here is a way to avoid this problem. The example uses the auto data.
>>
>> ****
>> sysuse auto, clear
>> drop foreign
>> save myauto1
>> sysuse auto, clear
>> drop price head
>> save myauto2
>>
>> qui ds
>> local second `r(varlist)'
>>
>> use myauto1, clear
>> qui ds
>> local first `r(varlist)'
>>
>> foreach x of local first {
>> if strpos("`second'","`x'") == 0 {
>>  di in yellow "`x' is not present in the second file."
>> }
>> }
>>
>> foreach x of local second {
>> if strpos("`first'","`x'") == 0 {
>>  di in yello "`x' is not present in the first file."
>> }
>> }
>>
>> erase myauto1.dta
>> erase myauto2.dta
>> ************************
>>
>> Eva
>>
>> 2008/9/15 Martin Weiss <mw.stlist@googlemail.com>:
>>>
>>> -h cf-
>>>
>>> Martin Weiss
>>> _______________________
>>> ----- Original Message ----- From: "Raphael Fraser"
>>> <raphael.fraser@gmail.com>
>>> To: <statalist@hsphsun2.harvard.edu>
>>> Sent: Monday, September 15, 2008 10:42 PM
>>> Subject: st: Comparing datasets
>>>
>>>
>>>> I have 300 variables in one dataset, 298 in another dataset. Both
>>>> datasets should have the same variable names. How can I identify which
>>>> variables do not match up?
>>>>
>>>> Raphael
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index