Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Dropping Duplicates that Aren't Exactly Duplicates

From	Lisa Chavez <[email protected]>
To	[email protected]
Subject	Re: st: RE: Dropping Duplicates that Aren't Exactly Duplicates
Date	Wed, 02 Nov 2011 13:17:10 -0700

It worked! Thank you so much! I suspected that the answer to myproblem would be found in reshaping to wide...:::smacking forehead::


On 11/2/2011 12:44 PM, Dimitriy V. Masterov wrote:

Lisa,

I think the inelegant code below will accomplish what you want. It is
untested and hinges on the violation variable being very clean. If the
latter is not the case, you may want to take a look at Google Refine.

/* remove leading, trailing and multiple whitespaces&  convert to
uppercase (may not be necessary, but good habit with ) */
replace violation=upper(trim(itrim(violation)));

/* sencode is from ssc. This not necessary, but may speed sorting if
you have lots of data */
sencode violation, replace;

/* reshape to make finding duplicates easier */
bys id arrdate (violation): gen cause=_n;
reshape wide violation, i(id arrdate) j(cause);
egen all_violations=group(violation*), missing;
sort id arrdate all_violation;
duplicates drop id all_violations, force; // duplicate will drop all
by the first occurrence, which will the earliest arrest because of the
sort

/* reshape back to your original format&  drop extraneous variables */
reshape long violation, i(id arrdate) j(cause);
drop if missing(violation);
drop all_violations cause;
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Dropping Duplicates that Aren't Exactly Duplicates
  - From: Lisa Chavez <[email protected]>
- st: RE: Dropping Duplicates that Aren't Exactly Duplicates
  - From: Nick Cox <[email protected]>
- Re: st: RE: Dropping Duplicates that Aren't Exactly Duplicates
  - From: Lisa Chavez <[email protected]>
- Re: st: RE: Dropping Duplicates that Aren't Exactly Duplicates
  - From: "Dimitriy V. Masterov" <[email protected]>

Prev by Date: Re: st: Draw an offer curve (supply curve)?
Next by Date: Re: st: Exclusion restriction with mprobit
Previous by thread: Re: st: RE: Dropping Duplicates that Aren't Exactly Duplicates
Next by thread: st: SSC Archive activity, October 2011
Index(es):
- Date
- Thread