Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message


From   "Chih-Mao Hsieh" <Hsieh@olin.wustl.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: RE: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message
Date   Tue, 30 Sep 2003 08:38:34 -0500

Nick Cox
 
Thank you!  As these datasets have millions of observations, any time-saving strategy will be important.
 
Best,
CM

	-----Original Message----- 
	From: owner-statalist@hsphsun2.harvard.edu on behalf of Nick Cox 
	Sent: Tue 9/30/2003 7:12 AM 
	To: statalist@hsphsun2.harvard.edu 
	Cc: 
	Subject: st: RE: RE: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message
	
	

	Chih-Mao Hsieh
	 
	> > I have a
	> > data file with three columns: citing, cited, nclass.  For
	> > every "citing", there are multiple "cited", and for each
	> > "cited" there is a "nclass".  The file is sorted by citing,
	> > then nclass.  I need a program to count the number of
	> > unique "nclass" strings associated to each "citing".
	> >
	> > As a simple example, given the following data file "data.dta":
	> >
	> > citing     cited         nclass
	> > 100         20            12
	> > 100         22            15
	> > 100         23            15
	> > 101         32            14
	> > 101         33            15
	> > 101         34            15
	> > 101         40            17
	> >
	> > I need the following output file:
	> >
	> > citing    numpatclass
	> > 100            2             [12 and 15 are unique, 15 is
	> repeated]
	> > 101            3             [14, 15, 17 are unique, 15
	> is repeated]
	
	> Phil Ryan gave excellent advice explaining how
	> this can be done, without loops, by using -by:-.
	>
	> In addition, note the FAQ
	> How do I compute the number of distinct observations?
	> http://www.stata.com/support/faqs/data/distinct.html
	> which explains approaches using -by:-, similar in
	> spirit to Phil's solution, and also gives manual
	> references and references to user-written software
	> in this area.
	>
	> Thus, a canned solution here is
	>
	> bysort citing : egen numpatclass = nvals(nclass)
	> by citing : keep if _n== 1
	
	Another approach is a double -contract-:
	
	contract citing nclass
	contract citing, freq(numpatclass)
	
	After the first -contract-, the number
	of observations for each value of -citing-
	is the number of distinct values of -nclass-
	observed for each;
	so the second -contract- immediately yields
	the desired count variable.
	
	That this solution using -contract- makes
	no use of -by:- or -_N- is pure illusion.
	Look inside -contract- at the Stata code
	-- -contract- is implemented as an .ado --
	and you will see that it is based on
	exactly the same machinery.
	
	Nick
	n.j.cox@durham.ac.uk
	
	
	*
	*   For searches and help try:
	*   http://www.stata.com/support/faqs/res/findit.html
	*   http://www.stata.com/support/statalist/faq
	*   http://www.ats.ucla.edu/stat/stata/
	


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index