Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: RE: Drop Duplicates while simultaneously eliminating opposite positive and negative values

 From Beatrice Benavidez To statalist@hsphsun2.harvard.edu Subject st: RE: Drop Duplicates while simultaneously eliminating opposite positive and negative values Date Sun, 11 Nov 2012 13:38:04 +0400

```Dear All,

I have this interesting problem where I would have the following dataset -

make     price       mpg
VW Diesel      5397        41
BMW 320i      9735        25
Datsun 510      5079        24
Audi 5000      9690        17
BMW 320i      -9735        25
BMW 320i      9375        25
BMW 320i      9375        25
BMW 320i      9735        25
BMW 320i      9735        25
VW Diesel      - 5397       41
BMW 320i      9735        25

The dataset has opposite positive and negative price values for the
common make and mpg (such as VW Diesel Price=5397 mpg=41 & VW Diesel
Price=-5397 mpg=41) while at the same time there are duplicates for
all make, price and mpg (BMW 320i Price=9375 mpg=25 appearing twice).

The opposite positive and negative price values for the common make
and mpg can also happen within duplicates based on all make, price and
mpg (BMW 320i Price=9735 mpg=25 appearing 4 times & BMW 320i
Price=-9735 mpg=25 appearing once).

I know how to proceed with the identification and flagging of
duplicate observations based on
http://www.stata.com/support/faqs/data-management/duplicate-observations/

I would like to be able to make a flag variable for both the opposite
positive and negative price values for the common make and mpg, while
only keeping one observation if there are duplicates for all make,
price and mpg.

At the same time, if there are 2 duplicated positive price values when
there is one opposite negative price value for the common make and
mpg, I would like to flag one positive price value observation and the
opposite negative price value counterpart. Vice versa would apply if
there are 2 duplicated negative price values and one opposite positive
price value, I would want to flag one negative price value observation
and the opposite positive price value observation.

Expanding on this in the general case, if there are more duplicated
positive price values than there are opposite negative price values
for the common make and mpg (duplicated or not), I would like to flag
all but one of the positive price value observation and (all) opposite
negative price value observation(s) for the common make and mpg. Vice
versa would apply if there are more duplicated negative price values
than there are opposite positive price values for the common make and
mpg.

I would like to flag all but ONE of either positive or negative price
value observations if the bigger number of duplicated sign groups are
the positive or negative price values respectively.

How should I proceed if I want to execute a flagging procedure for all
these three different situations simultaneously without missing
anything out?

Any help will be appreciated!

Thanks a lot!

Beatrice
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```