Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: duplicates tag - and range

From   "Nick Cox" <>
To   <>
Subject   st: RE: duplicates tag - and range
Date   Tue, 20 Feb 2007 18:47:45 -0000

As the original author of -duplicates- (which in turn owes
much to earlier joint work with Thomas Steichen)  I have
to say that its behaviour is exactly right here. Indeed 
I would say the same if I had never touched the code. 

-duplicates-' idea of a duplicate is that observations
are identical (on the variables specified). How could it 
be otherwise? Thus -duplicates- is indeed irrelevant to your problem. 

Your problem is different but is soluble in Stata terms 
if you can give exact rules for what kind of tolerance you allow
_within groups of observations_. As with any kind of clustering
problem, specifying a distance or difference tolerance is only 
part of the problem, as joining or merging rules need to be
specified too. 

> I am working with a very large panel dataset, and would like to tag
> observations that repeat annually (compared to the odd, or 
> the unscheduled
> observation). My rule for tagging observations is something like: if
> another observation falls exactly one year before or after the current
> observation (-/+ 3 days, to deal with minor deviations - due 
> to, say, dates
> that fall on weekends), tag both observations. I explored the use of
> "duplicates" and splitting the dates to year, month, and day to little
> effect (it can be used only for exact matches rather ranges, 
> and will tag
> similar observations in terms of day and month in 
> non-consecutive years).
> Any help would be greatly appreciated.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index