Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Short program to "collapse (# unique elements)": Use of nestedloops and a "weights not allowed" message

From   Philip Ryan <>
Subject   RE: st: Short program to "collapse (# unique elements)": Use of nestedloops and a "weights not allowed" message
Date   Tue, 30 Sep 2003 16:25:09 +0930

The line:

bysort citing nclass: gen byte unique = _n == _N

might be easier to interpret if we insert parentheses thus:

bysort citing nclass: gen byte unique = (_n == _N)

For each observation we evaluate the Boolean expression _n==_N. This will be true (evaluate to 1) when the current observation, denoted by _n, is also the last observation, denoted by _N **within each group formed by citing & nclass**. For all other observations in the group the expression is false (evaluates to 0). The variable unique is therefore assigned (=) values of 1 or 0, with the 1s denoting, just once, a new value within each citing group. We are then left with simply adding the 1s within each citing group to calculate the total number of unique values for that group.

The "byte" type was specified just to save storage space, and is something I almost always forget to do.


At 01:16 AM 30/09/2003 -0500, you wrote:

Thank you Phil, this works! I had glanced at the -bysort- command beforehand, but hadn't figured to do it this way. I am especially unfamiliar with the use of the "= _n == _N" syntax though, even though I just searched for it. What does it mean...?


-----Original Message-----
From: on behalf of Philip Ryan
Sent: Tue 9/30/2003 12:22 AM
Subject: Re: st: Short program to "collapse (# unique elements)": Use of nested loops and a "weights not allowed" message

This is a bit simpler and I think does what you want:

bysort citing nclass: gen byte unique = _n == _N
bys citing: replace unique = sum(unique)
by citing: keep if _n == _N

You should test this and maybe tweak it to deal with missing values, if
they exist in your data.

One point in your code: any command of this form "replace[_n]" will
generate an error code because Stata thinks your square brackets are
introducing weights and the syntax for -replace- does not permit
these. Also, you cannot use explicit indexing on a variable on the LHS of
a value assignment ("=") command.


At 11:41 PM 29/09/2003 -0500, you wrote:
>Hi statalisters,
>I have been working on a short program that doesn't seem to work, I think
>I'm just missing a small mistake... I have a data file with three
>columns: citing, cited, nclass. For every "citing", there are multiple
>"cited", and for each "cited" there is a "nclass". The file is sorted by
>citing, then nclass. I need a program to count the number of unique
>"nclass" strings associated to each "citing".
>As a simple example, given the following data file "data.dta":
>citing cited nclass
>100 20 12
>100 22 15
>100 23 15
>101 32 14
>101 33 15
>101 34 15
>101 40 17
>I need the following output file:
>citing numpatclass
>100 2 [12 and 15 are unique, 15 is repeated]
>101 3 [14, 15, 17 are unique, 15 is repeated]

Philip Ryan
Associate Professor,
Department of Public Health
Associate Dean (Information Technology)
Faculty of Health Sciences
University of Adelaide 5005
South Australia
tel 61 8 8303 3570
fax 61 8 8223 4075
CRICOS Provider Number 00123M
This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index