Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: RE: Drop Duplicates while simultaneously eliminating opposite positive and negative values

 From Beatrice Benavidez To statalist@hsphsun2.harvard.edu Subject Re: st: RE: Drop Duplicates while simultaneously eliminating opposite positive and negative values Date Mon, 26 Nov 2012 14:16:36 +0400

```I think I found it.

*** Number positive and negative duplicate values of make, mpg, price:
bys make mpg price: gen dupid = _n

*** Mark pairs of positive and negative price opposite/counterpart
duplicate valaues by using absolute values of price:
gen absprice = abs(price)
egen dupid_g = group(dupid)
su dupid_g, meanonly
* Flag these Pos & Neg absolute price duplicates
* G will stata which posneg counterpart it is whether it's the first
or 2nd or 3rd so on
gen flag_posneg = .
forvalues g = 1/`r(max)' {
duplicates tag make mpg absprice if dupid_g==`g' , gen(posneg_`g')
replace flag_posneg = 1 if posneg_`g'==1
}
drop posneg_* dupid_g dupid absprice

*** Mark Positive Price Duplicates or Negative Price Duplicates
stripped of pos neg price opp/counterpart dup values
* Prices stripped of positive/negative price duplicates
gen price_dup = price if flag_posneg!=1

* Duplicates stripped of positive/negative price duplicates
bys make mpg price_dup: gen dupid = _n if (flag_posneg==.)
gen flag_dup = 1 if (dupid>1 & dupid!=.)
drop price_dup dupid

* Putting flag_posneg & flag_dup together
gen flag_posneg_dup = 1 if ( flag_posneg == 1 | flag_dup == 1 )

Beatrice

You could try something like this:

* Number positive and negative duplicate values of price independently:
bys make mpg price: gen dupid = _n

* Mark pairs by absolute value of price
gen absprice = abs(price)
duplicates tag make mpg absprice dupid , gen(dup_pair)

* Look for unpaired duplicates
duplicates tag make mpg price if dup_pair==0 , gen(dup_nonpair)

I'm not sure which of these you want to keep/drop, but I think this would
1. unique: dupid==1  --or--  dup_pair==0 & dup_nonpair==0
2. pos/neg paired: dup_pair==1
3. additional pos or neg unpaired duplicates: dup_nonpair==1

Mike

On Sun, Nov 11, 2012 at 6:42 AM, daniel klein <klein.daniel.81@gmail.com> wrote:
>
> First of, I am sorry for reposting, but the last message got corrupted
> in the archive (broke into two pieces and omitting the middle part).
> Here is the second (and final) try:
>
> Beatrice,
>
> this is kind of confusing. You say, you want to
>
> "[...] keep[ing] one observation if there are duplicates for all make,
> price and mpg."
>
> You then go on, specifying rules for cases in which
>
> "there are 2 duplicated positive price values when there is one
> opposite negative price value for the common make and mpg"
>
> But this is impossible. Given the first step, which elimintates all
> but one positive (or negative) price value in the subgroup defined by
> make and mpg, there can no longer be any cases that have 2 (or more)
> duplicated positive (or negative) price values in terms of make and
> mpg.
>
> From your description it further seems to be arbitrary which
> observations with positive or negative price values to flag. But in
> this case, why worry about positve and negative price values at all,
> when the only difference in these observations seem to be the
> multiplier (-1)?
>
> It is not that I mind playing  a round with Stata -- on the contrary.
> But it migth help us help you, if you could comment on these
> statments, elaborate a little bit on the sequence of steps you want to
> take here, and maybe be more specific about your ultimate goal. An
> example dataset containing all the possibilities you have in mind
> would also be nice (only if your first example lacks any possible
> situation you want to tackle).
>
> Best
> Daniel
>
> --
> Dear All,
>
> [...]
> I would like to be able to make a flag variable for both the opposite
> positive and negative price values for the common make and mpg, while
> only keeping one observation if there are duplicates for all make,
> price and mpg.
>
> At the same time, if there are 2 duplicated positive price values when
> there is one opposite negative price value for the common make and
> mpg, I would like to flag one positive price value observation and the
> opposite negative price value counterpart. Vice versa would apply if
> there are 2 duplicated negative price values and one opposite positive
> price value, I would want to flag one negative price value observation
> and the opposite positive price value observation.
>
> Expanding on this in the general case,
> [...]
>
> Thanks a lot!
>
> Beatrice
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

--
Michael Barker
Department of Economics
Georgetown University
Washington, DC 20057
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```