Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Is it necessary to sort data before using -cf-?


From   gjhxmu@sina.com
To   statalist<statalist@hsphsun2.harvard.edu>
Subject   Re: st: RE: Is it necessary to sort data before using -cf-?
Date   Sun, 29 Nov 2009 21:33:42 +0800

Dear Martin,

Thank you for your help.

I am studying stata and found -cf- did not work as I expected previously.

So ask for help  to verify my doubt.



Thank you very much!

Best regards,
Rose


----- Original Message -----
From: Martin Weiss <martin.weiss1@gmx.de>
To: <statalist@hsphsun2.harvard.edu>
Subject: st: RE: Is it necessary to sort data before using -cf-?
Date: 2009-11-29 18:36:31


<>

At the end of the day, it is natural that a comparison of values of a
variable should be conducted row after row, so the -sort- order does matter
for it. The manual entry and help file do not mention this fact, but I feel
that it goes without saying. What else would you compare but the values line
by line?

Note how in the following code the datasets are both ordered by -rep78-.
Given that rep78 only features 5 distinct values, this -sort- order is not
unique, though. That is the reason for the existence of the -stable- option
to -sort-, btw...


*******
sysuse auto,clear
sort rep78
save new.dta, replace

u new.dta, clear
sort for
//ends up being sorted by rep78
sort rep78
cf _all using new.dta, verbose
*******

Given only 5 values to go by, -sort- has to randomize its results, and only
by chance will it produce the same result twice. These differences are
subsequently picked up by -cf-.

See also Phil`s http://www.stata-journal.com/sjpdf.html?articlenum=dm0019
and http://www.stata.com/support/faqs/lang/sort.html


There is a -findit compdta- package, which is quite old and runs under
-version 4.0-. It does, however, feature a -sort- option.


HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of gjhxmu@sina.com
Sent: Sonntag, 29. November 2009 10:36
To: statalist
Subject: st: Is it necessary to sort data before using -cf-? 

Dear statalists,

Is it necessary to sort data before using -cf-? 
Without sorting, I found two same datasets are reported difference. However,
I found no reference in -help cf-.
If necessary, how to determine the sorted variable(s) if I compare all the
variables or certain variables?
Does the sorted variable need to have no duplicates?

For example,

. sysuse auto,clear
(1978 Automobile Data)

. sort turn

. save new,replace
file new.dta saved

. sysuse auto,clear
(1978 Automobile Data)

. sort rep78

. cf _all using new
make: 74 mismatches
price: 74 mismatches
mpg: 69 mismatches
rep78: 63 mismatches
headroom: 64 mismatches
trunk: 72 mismatches
weight: 73 mismatches
length: 73 mismatches
turn: 71 mismatches
displacement: 72 mismatches
gear_ratio: 72 mismatches
foreign: 42 mismatches
r(9);

. 
Could anyone help me? Thank you.


Best regards,
Rose


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index