Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Comparing two data set

From   Rajaram Subramanian Potty <>
Subject   Re: st: Comparing two data set
Date   Thu, 3 Mar 2011 09:30:17 +0530

Thank you very much for the information on different ways to compare
two files. I have generated error list using both cf and cf3. The
error list generated by these two methos are found to be identical. We
also checked manually the data for few observations from these two
error list and found to be correct. However, I am not sure whether the
error list will be correct if the ID variable is not entered correctly
in the two data set.



On Wed, Mar 2, 2011 at 6:00 PM, Nick Cox <> wrote:
> Let me summarize this as I see it.
> You have adopted a user-written program from 2001 rather than use
> official commands which allow you to address all facets of your
> problem.
> There are various pluses and minuses to this strategy.
> The main plus is that you appear to have got exactly what you wanted,
> and fast. That has to be important in a busy world.
> In this particular case, the author is Thomas Steichen who is
> well-known in the community as an extremely competent programmer. I
> would say that even if he weren't a co-author of mine and a personal
> friend.
> Beyond that, various observations spring to mind.
> -cf3- is explicit that a single identifier variable is expected and
> that matches your situation. Fine, but something else will be needed
> whenever your situation differs.
> -cf3- dates from 2001. That it still works is a credit to Tom and to
> Stata, but the help file alone talks about an internal procedure that
> truncates variable names to 7 characters, which I would guess is much
> more likely to bite with modern datasets and much longer variable
> names. I also see much internal code that deals with date and time
> formats, but much has changed in Stata's handling of dates and times
> since the program was issued.
> Let me stress: I do not want to disparage the use of user-written
> programs. That would be absurd for various reasons! As said, I just
> want to underline the pluses and minuses here. There are a lot of old
> programs around which are not being maintained and may not be strongly
> supported if users get into difficulties. (In my case, I have
> documented at -njc_stuff- from SSC what I think remains of use and
> what is superseded or obsolete.)
> Nick
> Rajaram Subramanian Potty
> Thank you very much for the information. Installed the -cf3- and able
> to generate the error list by the ID.
> On Wed, Mar 2, 2011 at 3:33 PM, Kevin Owuor <> wrote:
>> Maybe you can Tryout cf3 package type --findit lists errors by id
> [various suggestions]
>>>> On Wed, Mar 2, 2011 at 9:01 AM, Rajaram Subramanian Potty
>>>> <> wrote:
>>>>> We are carried out a survey and the data from the survey was entered
>>>>> two times. Now, we want to compare these two data files for possible
>>>>> data etnry errors. Please, inform how to compare the two data files
>>>>> and identify the data entry error using stata.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index