Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Comparing two data set


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Comparing two data set
Date   Wed, 2 Mar 2011 09:14:12 +0000

One way is to check that the .dta or other data files are identical
using your operating system.

Also, check out -cf- and -dta_equal-.

Another way to approach this is to -append- the datasets and look for
-duplicates-. However, -duplicates- just looks for duplicate
observations. In principle, the variable names, variable labels, value
labels, formats and characteristics must also be shown to be
identical.

To do this last, you will need to create a dataset identifier so that
you can work out where any anomalies are.

Here is an example where by construction the interesting part of the
data is identical. So, -duplicates- confirms that everything occurs
twice. Conversely, mismatches would imply singletons, triplicates,
etc.

. sysuse auto
(1978 Automobile Data)

. gen ds = 1

. save auto1
file auto1.dta saved

. sysuse auto, clear
(1978 Automobile Data)

. gen ds = 2

. append using auto1
(label origin already defined)


. tab ds

         ds |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         74       50.00       50.00
          2 |         74       50.00      100.00
------------+-----------------------------------
      Total |        148      100.00

. duplicates report make-foreign

Duplicates in terms of make price mpg rep78 headroom trunk weight
length turn displacement
    gear_ratio foreign

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        2 |          148            74
--------------------------------------

Nick

On Wed, Mar 2, 2011 at 9:01 AM, Rajaram Subramanian Potty
<[email protected]> wrote:

> We are carried out a survey and the data from the survey was entered
> two times. Now, we want to compare these two data files for possible
> data etnry errors. Please, inform how to compare the two data files
> and identify the data entry error using stata.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index