Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: -distinct- updated on SSC


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: -distinct- updated on SSC
Date   Wed, 21 Mar 2012 18:06:11 +0000

Thanks to Kit Baum as usual, -distinct- by Gary Longton and myself has been updated on SSC. -distinct- requires Stata 8 only and the revised version may be installed using -ssc- or -adoupdate-. 

-distinct- was also published through the Stata Journal and the corresponding paper at 

SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
        (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
        Q4/08   SJ 8(4):557--568
        shows how to answer questions about distinct observations
        from first principles; provides a convenience command

is visible -- under Stata Journal's three-year moving window -- to all at 

http://www.stata-journal.com/sjpdf.html?articlenum=dm0042

The update to the program will also be made in Stata Journal 12(2), in about three months' time, so the version on SSC is the most recent version that is publicly available. 

The update contains filters that stipulate that display is restricted, by either or both of -max()- and -min()- options. For example, -distinct, max(2)- stipulates display of variables be restricted to those with at most 2 distinct values. 

. sysuse auto
(1978 Automobile Data)

. distinct

              |        Observations
              |      total   distinct
--------------+----------------------
         make |         74         74
        price |         74         74
          mpg |         74         21
        rep78 |         69          5
     headroom |         74          8
        trunk |         74         18
       weight |         74         64
       length |         74         47
         turn |         74         18
 displacement |         74         31
   gear_ratio |         74         36
      foreign |         74          2

. distinct, max(2)

              |        Observations
              |      total   distinct
--------------+----------------------
      foreign |         74          2


By the way, missing values are ignored by default, but may optionally be included in calculations. 

I wanted these options badly for myself a few weeks ago when trying to get to grips with a large and awkwardly named dataset and wanting to focus quickly on which variables were likely to be group identifiers. The extra options serendipitously are pertinent to recent threads that turn on the identification of binary variables, as was mentioned in earlier discussion -- and indeed a very recent thread on singleton variables. 

Nick 
n.j.cox@durham.ac.uk 



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index