[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Is it necessary to sort data before using -cf-?

From   "Martin Weiss" <>
To   <>
Subject   st: RE: Is it necessary to sort data before using -cf-?
Date   Sun, 29 Nov 2009 11:36:31 +0100


At the end of the day, it is natural that a comparison of values of a
variable should be conducted row after row, so the -sort- order does matter
for it. The manual entry and help file do not mention this fact, but I feel
that it goes without saying. What else would you compare but the values line
by line?

Note how in the following code the datasets are both ordered by -rep78-.
Given that rep78 only features 5 distinct values, this -sort- order is not
unique, though. That is the reason for the existence of the -stable- option
to -sort-, btw...

sysuse auto,clear
sort rep78
save new.dta, replace

u new.dta, clear
sort for
//ends up being sorted by rep78
sort rep78
cf _all using new.dta, verbose

Given only 5 values to go by, -sort- has to randomize its results, and only
by chance will it produce the same result twice. These differences are
subsequently picked up by -cf-.

See also Phil`s

There is a -findit compdta- package, which is quite old and runs under
-version 4.0-. It does, however, feature a -sort- option.


-----Original Message-----
[] On Behalf Of
Sent: Sonntag, 29. November 2009 10:36
To: statalist
Subject: st: Is it necessary to sort data before using -cf-? 

Dear statalists,

Is it necessary to sort data before using -cf-? 
Without sorting, I found two same datasets are reported difference. However,
I found no reference in -help cf-.
If necessary, how to determine the sorted variable(s) if I compare all the
variables or certain variables?
Does the sorted variable need to have no duplicates?

For example,

. sysuse auto,clear
(1978 Automobile Data)

. sort turn

. save new,replace
file new.dta saved

. sysuse auto,clear
(1978 Automobile Data)

. sort rep78

. cf _all using new
            make:  74 mismatches
           price:  74 mismatches
             mpg:  69 mismatches
           rep78:  63 mismatches
        headroom:  64 mismatches
           trunk:  72 mismatches
          weight:  73 mismatches
          length:  73 mismatches
            turn:  71 mismatches
    displacement:  72 mismatches
      gear_ratio:  72 mismatches
         foreign:  42 mismatches

Could anyone help me? Thank you.

Best regards,

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index