[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: RE: -finddup- for panel?
If observations are duplicates, the choice of
which to keep can be difficult...
-duplicates- arrived with Stata 8. Some
users were already in the habit of using
various user-written programs published
in the STB or on SSC, including -unique-,
-finddup-, -dups- and various others.
If they serve your purpose, fine.
But you no doubt are aware that observations
can be duplicates with respect to some
variables -- in your case -id- and -year- --
but differ with respect to other variables.
-finddup- offers no facilities for dropping
duplicates. It is an inspection program,
and gives information which can be used
to decide on what to -drop-.
The intent of -duplicates- is to provide
a more general tool, including functionality
for -drop-ping duplicates. But -duplicates-
will not let you go
. duplicates drop id year
whenever other variables also exist. You
must spell out
. duplicates drop id year, force
as a reminder that you may be losing information.
In this way -duplicates- is designed to be
potentially destructive, but also to inhibit
accidental loss of real information.
> -----Original Message-----
> From: firstname.lastname@example.org
> [mailto:email@example.com]On Behalf Of joe J.
> Sent: 21 April 2004 11:08
> To: firstname.lastname@example.org
> Subject: RE: st: RE: -finddup- for panel?
> Stata's official -duplicates- command also helps to identify
> observations. But I have a feeling that -finddup- is useful
> when one has to
> decide over which among the duplicates to include and which
> to exclde (for
> late use, say) while generating a dupliate-free data set.
* For searches and help try: