[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Question about duplicates and tag command

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: Question about duplicates and tag command
Date	Mon, 27 Feb 2006 12:29:46 -0000

If you have duplicates, then the -duplicates- 
command should be useful. 

Doing it from first principles is instructive, 
but you have to be clear on some basics. 

In your case, Stata is working as advertised. 
Your duplicates must be identified in terms 
of both -code- and -year-. Just sorting on
-code- does not determine where observations
with different values of -year- will occur. 

Nick 
[email protected] 

Mosca, Ilaria
 
> I have been working with a database in which I had to identify
> duplicates of institutions and afterwards count the number of
> institutions per year. I therefore wrote the following commands:
> 
> . sort code
> . quietly by code: gen dup=cond(_N==1,0,_n)
> . drop if dup>1
> . gen id04==1 if year==2004
> . count if id04==1
> 
> My problem is that EVERY TIME that I was running these commands, I
> obtained different results! Once the count command was 749, once 753,
> and so on. And this without any apparent reason.
> 
> In order to cope with this problem I therefore used the 
> command tag, and
> namely:
> . sort code
> . egen tag=tag(code)
> . count if tag==1
> 
> I runned these commands several times and the results shown are always
> the same. 
> 
> My question to you is thus the following: why does the command for
> duplicates seem not to work in this case? I frequently have 
> to identify
> duplicates in my databases, and I use these commands pretty often. But
> getting different results every time, cast doubts on its 
> effectiveness.
 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: Variable Lable
Next by Date: Re: st: Variable Lable
Previous by thread: st: RE: Variable Lable
Next by thread: st: NASUG'2006, Boston, 24-25 July 2006
Index(es):
- Date
- Thread