Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Using -collapse- extensively to find historical, irregular matches: Better way?


From   "Chih-Mao Hsieh" <Hsieh@olin.wustl.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Using -collapse- extensively to find historical, irregular matches: Better way?
Date   Tue, 30 Sep 2003 09:34:44 -0500

What I meant to type was:

collapse (mean) mean=cited (sum) sum=cited (sd) sd=cited, by(citing) ;
sort mean sum sd citing ;
by mean sum sd: gen byte counter = _n ;
replace counter=counter-1 ;

Cheers, --Chihmao

	-----Original Message----- 
	From: owner-statalist@hsphsun2.harvard.edu on behalf of Chih-Mao Hsieh 
	Sent: Tue 9/30/2003 9:20 AM 
	To: statalist@hsphsun2.harvard.edu 
	Cc: 
	Subject: st: Using -collapse- extensively to find historical, irregular matches: Better way?
	
	

	Hi all,
	
	I have a two-column file with variables "citing" and "cited".  "Citing" refers to a patent, and "cited" refers to a patent that is "cited" by the "citing" patent.  Therefore, if a patent cites and therefore "recombines" 3 patents prior to it, this history shows up as 3 rows (end of message has examples).
	
	I need a program to catch the number of times that the exact same set of patents has been "recombined" in the past (i.e. imagine trying to find all the papers that cite the same set of references that you do in one of your papers!).
	
	The basic solution I have come up with is the following:
	
	collapse (mean) mean=cited (sum) sum=cited (sd) sd=cited, by(citing)
	bysort mean sum sd: gen byte counter = _n
	replace counter=counter-1
	
	It seems to work, and as the datafile has 16 million rows, with 3 million unique "citing" numbers -- therefore with a fair amount of variance -- I believe it may be good enough.  My questions are: (1) Is there a more accurate way, if less efficient, to do what I need? (2) Is there any reason I should expect Stata to calculate means, sums, and sd's in different ways from row to row (i.e. rounding) that would render totally ineffective my specific use of -collapse-?  I attach an example below.
	
	Thanks, --Chihmao
	
	------------------------------------------
	
	citing      cited
	100          30
	100          32
	100          33
	101          34
	101          35
	105          30
	105          32
	105          33
	106          29
	106          30
	108          30
	108          32
	108          33
	
	Desired output:
	
	citing      counter
	100            0
	101            0
	105            1    (since #100 cited the exact same list of patents, no more, no less)
	106            0
	108            2    (since there are now 2 prior occurrences of same patent list: #100 and #105)
	
	*
	*   For searches and help try:
	*   http://www.stata.com/support/faqs/res/findit.html
	*   http://www.stata.com/support/statalist/faq
	*   http://www.ats.ucla.edu/stat/stata/
	


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index