Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Comparing two data set


From   Rajaram Subramanian Potty <[email protected]>
To   [email protected]
Subject   Re: st: Comparing two data set
Date   Wed, 2 Mar 2011 14:55:34 +0530

Dear Nick,

Thanks for the information. Twor or three times I used the -cf-
command to identify the errors in two data files. But I want the error
should be displayed according to the ID variable. But presently, the
-cf-  command gives error by observation number in the Stata data set
and not by the ID variable. If I will be able to generate the errors
according to the ID variable, it will be easy for use to trace
questionnaire and find the error in the data entry. So, I just want to
know whether it is possible to get the error listed by the ID vriable.

Thanks and regards,

RAJARAM. S

On Wed, Mar 2, 2011 at 2:44 PM, Nick Cox <[email protected]> wrote:
> One way is to check that the .dta or other data files are identical
> using your operating system.
>
> Also, check out -cf- and -dta_equal-.
>
> Another way to approach this is to -append- the datasets and look for
> -duplicates-. However, -duplicates- just looks for duplicate
> observations. In principle, the variable names, variable labels, value
> labels, formats and characteristics must also be shown to be
> identical.
>
> To do this last, you will need to create a dataset identifier so that
> you can work out where any anomalies are.
>
> Here is an example where by construction the interesting part of the
> data is identical. So, -duplicates- confirms that everything occurs
> twice. Conversely, mismatches would imply singletons, triplicates,
> etc.
>
> . sysuse auto
> (1978 Automobile Data)
>
> . gen ds = 1
>
> . save auto1
> file auto1.dta saved
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . gen ds = 2
>
> . append using auto1
> (label origin already defined)
>
>
> . tab ds
>
>         ds |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>          1 |         74       50.00       50.00
>          2 |         74       50.00      100.00
> ------------+-----------------------------------
>      Total |        148      100.00
>
> . duplicates report make-foreign
>
> Duplicates in terms of make price mpg rep78 headroom trunk weight
> length turn displacement
>    gear_ratio foreign
>
> --------------------------------------
>   copies | observations       surplus
> ----------+---------------------------
>        2 |          148            74
> --------------------------------------
>
> Nick
>
> On Wed, Mar 2, 2011 at 9:01 AM, Rajaram Subramanian Potty
> <[email protected]> wrote:
>
>> We are carried out a survey and the data from the survey was entered
>> two times. Now, we want to compare these two data files for possible
>> data etnry errors. Please, inform how to compare the two data files
>> and identify the data entry error using stata.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index