Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message


From   "Chih-Mao Hsieh" <Hsieh@olin.wustl.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message
Date   Tue, 30 Sep 2003 01:16:32 -0500

Thank you Phil, this works!  I had glanced at the -bysort- command beforehand, but hadn't figured to do it this way.  I am especially unfamiliar with the use of the "= _n == _N" syntax though, even though I just searched for it.  What does it mean...?
 
CM

	-----Original Message----- 
	From: owner-statalist@hsphsun2.harvard.edu on behalf of Philip Ryan 
	Sent: Tue 9/30/2003 12:22 AM 
	To: statalist@hsphsun2.harvard.edu 
	Cc: 
	Subject: Re: st: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message
	
	

	This is a bit simpler and I think does what you want:
	
	bysort citing nclass: gen byte unique = _n == _N
	bys citing: replace unique = sum(unique)
	by citing: keep if _n == _N
	
	You should test this and maybe tweak it to deal with missing values, if
	they exist in your data.
	
	One point in your code:  any command of this form "replace[_n]"  will
	generate an error code because Stata thinks your square brackets are
	introducing weights and the syntax for -replace- does not permit
	these.  Also, you cannot use explicit indexing on a variable on the LHS of
	a value assignment ("=") command.
	
	Phil
	
	
	At 11:41 PM 29/09/2003 -0500, you wrote:
	>Hi statalisters,
	>
	>I have been working on a short program that doesn't seem to work, I think
	>I'm just missing a small mistake...  I have a data file with three
	>columns: citing, cited, nclass.  For every "citing", there are multiple
	>"cited", and for each "cited" there is a "nclass".  The file is sorted by
	>citing, then nclass.  I need a program to count the number of unique
	>"nclass" strings associated to each "citing".
	>
	>As a simple example, given the following data file "data.dta":
	>
	>citing     cited         nclass
	>100         20            12
	>100         22            15
	>100         23            15
	>101         32            14
	>101         33            15
	>101         34            15
	>101         40            17
	>
	>I need the following output file:
	>
	>citing    numpatclass
	>100            2             [12 and 15 are unique, 15 is repeated]
	>101            3             [14, 15, 17 are unique, 15 is repeated]
	>
	>I have decided to do it by creating an intermediate file which I will
	>later collapse(max):
	>
	>citing     cited         nclass         indexpatclass
	>100         20            12                    1
	>100         22            15                    2
	>100         23            15                    2
	>101         32            14                    1
	>101         33            15                    2
	>101         34            15                    2
	>101         40            17                    3
	>
	>"indexpatclass" indexes by 1 whenever a "citing" involves a new "nclass",
	>and resets to 1 whenever a new "citing" begins.  So I have created a short
	>program.  It sorts by "citing" and "nclass", then it uses a while-loop,
	>and then two if-loops.  But there are two problems: (1) I am getting a
	>"weights not allowed" message when I try to run it.  (2) I am also not
	>sure whether I am properly nesting my loops.  Can anybody provide any
	>insight?  Or alternatively, is there a much simpler way to do what I am
	>attempting?
	>
	>Thanks, --Chihmao.
	>
	>--------------------------------
	>
	># delimit cr
	>program define uniqpatclass
	>use c:\temp\data
	>generate indexpatclass=0
	>sort citing nclass
	>replace indexpatclass=1 in 1
	>generate id=_n
	>
	>while id<_N {
	>    if citing[_n]==citing[_n-1] {
	>       if nclass[_n]==nclass[_n-1] {
	>          replace indexpatclass[_n]=indexpatclass[_n-1]
	>          id = `id' + 1
	>       }
	>       else {
	>    replace indexpatclass[_n]=indexpatclass[_n-1]+1
	>    id = `id' + 1
	>    }
	>    }
	>    else {
	>replace indexpatclass[_n]=1}
	>id = `id' + 1
	>}
	>end
	>
	>
	>
	>*
	>*   For searches and help try:
	>*   http://www.stata.com/support/faqs/res/findit.html
	>*   http://www.stata.com/support/statalist/faq
	>*   http://www.ats.ucla.edu/stat/stata/
	
	Philip Ryan
	Associate Professor,
	Department of Public Health
	Associate Dean (Information Technology)
	Faculty of Health Sciences
	University of Adelaide 5005
	South Australia
	tel 61 8 8303 3570
	fax 61 8 8223 4075
	http://www.public-health.adelaide.edu.au/
	CRICOS Provider Number 00123M
	-----------------------------------------------------------
	This email message is intended only for the addressee(s)
	and contains information that may be confidential and/or
	copyright. If you are not the intended recipient please
	notify the sender by reply email and immediately delete
	this email. Use, disclosure or reproduction of this email
	by anyone other than the intended recipient(s) is strictly
	prohibited. No representation is made that this email or
	any attachments are free of viruses. Virus scanning is
	recommended and is the responsibility of the recipient.
	
	*
	*   For searches and help try:
	*   http://www.stata.com/support/faqs/res/findit.html
	*   http://www.stata.com/support/statalist/faq
	*   http://www.ats.ucla.edu/stat/stata/
	


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index