Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Rajaram Subramanian Potty <rajara999@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Comparing two data set |
Date | Wed, 2 Mar 2011 14:55:34 +0530 |
Dear Nick, Thanks for the information. Twor or three times I used the -cf- command to identify the errors in two data files. But I want the error should be displayed according to the ID variable. But presently, the -cf- command gives error by observation number in the Stata data set and not by the ID variable. If I will be able to generate the errors according to the ID variable, it will be easy for use to trace questionnaire and find the error in the data entry. So, I just want to know whether it is possible to get the error listed by the ID vriable. Thanks and regards, RAJARAM. S On Wed, Mar 2, 2011 at 2:44 PM, Nick Cox <njcoxstata@gmail.com> wrote: > One way is to check that the .dta or other data files are identical > using your operating system. > > Also, check out -cf- and -dta_equal-. > > Another way to approach this is to -append- the datasets and look for > -duplicates-. However, -duplicates- just looks for duplicate > observations. In principle, the variable names, variable labels, value > labels, formats and characteristics must also be shown to be > identical. > > To do this last, you will need to create a dataset identifier so that > you can work out where any anomalies are. > > Here is an example where by construction the interesting part of the > data is identical. So, -duplicates- confirms that everything occurs > twice. Conversely, mismatches would imply singletons, triplicates, > etc. > > . sysuse auto > (1978 Automobile Data) > > . gen ds = 1 > > . save auto1 > file auto1.dta saved > > . sysuse auto, clear > (1978 Automobile Data) > > . gen ds = 2 > > . append using auto1 > (label origin already defined) > > > . tab ds > > ds | Freq. Percent Cum. > ------------+----------------------------------- > 1 | 74 50.00 50.00 > 2 | 74 50.00 100.00 > ------------+----------------------------------- > Total | 148 100.00 > > . duplicates report make-foreign > > Duplicates in terms of make price mpg rep78 headroom trunk weight > length turn displacement > gear_ratio foreign > > -------------------------------------- > copies | observations surplus > ----------+--------------------------- > 2 | 148 74 > -------------------------------------- > > Nick > > On Wed, Mar 2, 2011 at 9:01 AM, Rajaram Subramanian Potty > <rajara999@gmail.com> wrote: > >> We are carried out a survey and the data from the survey was entered >> two times. Now, we want to compare these two data files for possible >> data etnry errors. Please, inform how to compare the two data files >> and identify the data entry error using stata. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/