Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message
Date   Tue, 30 Sep 2003 13:12:33 +0100

Chih-Mao Hsieh
  
> > I have a 
> > data file with three columns: citing, cited, nclass.  For 
> > every "citing", there are multiple "cited", and for each 
> > "cited" there is a "nclass".  The file is sorted by citing, 
> > then nclass.  I need a program to count the number of 
> > unique "nclass" strings associated to each "citing".
> > 
> > As a simple example, given the following data file "data.dta":
> > 
> > citing     cited         nclass
> > 100         20            12
> > 100         22            15
> > 100         23            15
> > 101         32            14
> > 101         33            15
> > 101         34            15
> > 101         40            17
> > 
> > I need the following output file:
> > 
> > citing    numpatclass
> > 100            2             [12 and 15 are unique, 15 is 
> repeated]
> > 101            3             [14, 15, 17 are unique, 15 
> is repeated]
 
> Phil Ryan gave excellent advice explaining how 
> this can be done, without loops, by using -by:-. 
> 
> In addition, note the FAQ 
> How do I compute the number of distinct observations?
> http://www.stata.com/support/faqs/data/distinct.html
> which explains approaches using -by:-, similar in 
> spirit to Phil's solution, and also gives manual 
> references and references to user-written software
> in this area. 
> 
> Thus, a canned solution here is 
> 
> bysort citing : egen numpatclass = nvals(nclass)
> by citing : keep if _n== 1 

Another approach is a double -contract-: 

contract citing nclass
contract citing, freq(numpatclass) 

After the first -contract-, the number 
of observations for each value of -citing- 
is the number of distinct values of -nclass-
observed for each; 
so the second -contract- immediately yields 
the desired count variable. 

That this solution using -contract- makes 
no use of -by:- or -_N- is pure illusion. 
Look inside -contract- at the Stata code 
-- -contract- is implemented as an .ado -- 
and you will see that it is based on 
exactly the same machinery. 

Nick 
n.j.cox@durham.ac.uk 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index