[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: -cfvars- available on SSC for comparing variable lists in two datasets

From   "Nick Cox" <>
To   <>
Subject   st: -cfvars- available on SSC for comparing variable lists in two datasets
Date   Fri, 20 Feb 2009 13:36:57 -0000

Thanks to Kit Baum, a new package -cfvars- is now available on SSC. Use
-ssc- to install. Stata 9 is required. -cfvars- will probably work on
Stata 8 too, but I've not tested it and you'd need to hack at the
-version- statement (and take responsibility too).  

Yesterday David Kantor started a lively thread on how to compare the
lists of variable names in two datasets, which provoked various
suggestions and various programs. I was stuck at home reading Statalist
but not able to email, and I was programming along too. David didn't
really want anyone to write a program, but the problem evidently
interested several people, so this is my take. 

The help for -cfvars- isn't long, and the gist's copied below my


Compare variable name lists in two data sets


        cfvars filename1 [filename2]


    cfvars compares the lists of variable names in Stata data file
filename1 and

        either Stata data file filename2, if specified,

        or the data currently in memory, otherwise.

    cfvars prints lists of variable names in both datasets (if any) and
in each dataset but
    not the other (again, if any in either case).


    Note that filenames must be those of Stata .dta files and must be
enclosed in double
    quotes whenever they include spaces.  The .dta extension is not
required and will be
    added if absent.

    Note also that there is absolutely no checking of variable values.
That is the job of 


    . sysuse auto
    . drop mpg
    . cfvars auto.dta

    . cfvars frog.dta toad.dta
    . cfvars frog toad

    . cfvars "c:\somewhere\older frog.dta" frog.dta

Saved results 

    r(both)      list of variable names in both
    r(oneonly)   list of variable names only in first-named file
    r(twoonly)   list of variable names only in second-named file or
data in memory
    r(same)      1 if datasets have same variable names, 0 otherwise

    Note that r(same) is always returned. The other results are returned
only if not empty.
    Even if not returned, a subsequent test such as "`r(both)'" == ""
will return 1 (true)
    as usual.


    This problem was suggested on Statalist by David Kantor on 19
February 2009. Several
    people contributed ideas to the resulting thread.

Also see

    help for cf, describe, ds, unab

*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index