Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Dropping Duplicates that Aren't Exactly Duplicates


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: RE: Dropping Duplicates that Aren't Exactly Duplicates
Date   Wed, 2 Nov 2011 19:18:11 +0000

Please give an example. 

Nick 
[email protected] 


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Lisa Chavez
Sent: 02 November 2011 18:50
To: [email protected]
Subject: Re: st: RE: Dropping Duplicates that Aren't Exactly Duplicates

Thank you for your reply.  I tried that before but what happened was 
that I ended up dropping rows that I didn't want to drop.    For 
example, say a person has three arrest events with four violations 
each.   The first two arrest events have the exact same violations and 
the third arrest has two violations but ONE of violations in the third 
arrest was the same as one violation in one of the first two arrest 
events.    The result is that I dropped out a single violation out of 
the third arrest event (and I wanted the third arrest untouched).  --Lisa

On 11/2/2011 11:32 AM, Nick Cox wrote:
> In general, you are in charge. You get to define what counts as a duplicate you want to drop.
>
> Also, you can drop duplicates using any syntax you want that does the job.
>
> The -duplicates- command is the way of dealing with duplicates with which I am most familiar. I think you want to
>
> duplicates drop id violation, force
>
> Nick
> [email protected]
>
> Lisa Chavez
>
> I have data in long file format that has three variables:  id, arrdate
> and violation.
>
> Below is an example of a person who has three arrest events (I have
> separated them with lines).
>
> Looking at the first two arrest dates (11mar2004 and 13jan2005) you see
> that each arrest has three violations and they are exactly the same.
>
> I have lots of examples like this one;  in all instances I want to drop
> the last arrest event where this duplication occurs.
>
> In the case below, I would want to drop all rows associated with the
> 13jan2005 arrest event.
>
> I'd appreciate any help you can offer.
>
> Thanks!
>
> Lisa
>
> +----------------------------------------------------------------------------------------+
> id
> arrdate                                                         violation
> ----------------------------------------------------------------------------------------
> A0000518   11mar2004                                 Cocaine-Possess
> Possess Cocaine
> A0000518   11mar2004   Nonmoving Traffic Viol  Drive While Lic Susp
> Habitual Offender
> A0000518   11mar2004                    Traffic Offense  Dui Alcohol Or
> Drugs 1St Off
> ----------------------------------------------------------------------------------------
> A0000518   13jan2005                                 Cocaine-Possess
> Possess Cocaine
> A0000518   13jan2005   Nonmoving Traffic Viol  Drive While Lic Susp
> Habitual Offender
> A0000518   13jan2005                    Traffic Offense  Dui Alcohol Or
> Drugs 1St Off
> ----------------------------------------------------------------------------------------
> A0000518   27feb2009
> Hallucinogen-Sell  Schedule Ii
> +----------------------------------------------------------------------------------------+
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index