Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Re: Making sure identifiers are unique

From   "louis boakye-yiadom" <[email protected]>
To   [email protected]
Subject   RE: st: Re: Making sure identifiers are unique
Date   Tue, 08 Mar 2005 16:07:46 +0000

Thanks a lot. I'll do that.


I guess you need to look at the help for collapse some more. -collapse- calculates whatever summary statistics you specify for each unique combination of the by variable list and then collapses the dataset to one observation for each of these unique combinations. It will, by design, always result in the by varlist becoming a unique identifier.

Michael Blasnik
[email protected]

----- Original Message ----- From: "louis boakye-yiadom" <[email protected]>
To: <[email protected]>
Sent: Tuesday, March 08, 2005 10:48 AM
Subject: st: Making sure identifiers are unique

Dear all,
I've been trying to determine the identifiers of a data set, and to ensure they're unique. Suspecting the variables, "region" and "district" are the identifiers, I gave the commands below, and got the output shown:
. sort region district
. by region district: assert _N==1
62 contradictions in 97 by-groups
assertion is false

Owing to the fact that I'm more interested in the "district"-level data, I wanted to know whether a collapsed version of the data will have unique identifiers. I therefore gave the following set of commands and got the results shown:
. gen x=1
. collapse (count) x, by (region district)
. sort region district
. by region district: assert _N==1

My question is: What can account for the collaped data being uniquely identified by "region" and "district", whilst the original data are not? I'm using version 8.2.

Many thanks,

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index