Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Dropping Duplicates that Aren't Exactly Duplicates


From   Lisa Chavez <lchavez@law.berkeley.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Dropping Duplicates that Aren't Exactly Duplicates
Date   Wed, 02 Nov 2011 13:17:10 -0700

It worked! Thank you so much! I suspected that the answer to my problem would be found in reshaping to wide...:::smacking forehead::

On 11/2/2011 12:44 PM, Dimitriy V. Masterov wrote:
Lisa,

I think the inelegant code below will accomplish what you want. It is
untested and hinges on the violation variable being very clean. If the
latter is not the case, you may want to take a look at Google Refine.

/* remove leading, trailing and multiple whitespaces&  convert to
uppercase (may not be necessary, but good habit with ) */
replace violation=upper(trim(itrim(violation)));

/* sencode is from ssc. This not necessary, but may speed sorting if
you have lots of data */
sencode violation, replace;

/* reshape to make finding duplicates easier */
bys id arrdate (violation): gen cause=_n;
reshape wide violation, i(id arrdate) j(cause);
egen all_violations=group(violation*), missing;
sort id arrdate all_violation;
duplicates drop id all_violations, force; // duplicate will drop all
by the first occurrence, which will the earliest arrest because of the
sort

/* reshape back to your original format&  drop extraneous variables */
reshape long violation, i(id arrdate) j(cause);
drop if missing(violation);
drop all_violations cause;
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index