Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Identify duplicate observations by a varlist, then drop them based on other variables


From   Aaron Kirkman <ak1795mailserv@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Identify duplicate observations by a varlist, then drop them based on other variables
Date   Sun, 30 Sep 2012 20:15:17 -0500

Dear Statalist,

I have a dataset with about 20 million observations and I'd like to
remove duplicate observations from it. However, the observations are
only duplicated in the --date--, --symbol--, and --adjclose--
variables, not the -exchange- variable, as shown.

     date   exchange   symbol   adjclose
     8496     NASDAQ      ADP       1.39
     8497     NASDAQ      ADP       1.42
     8498     NASDAQ      ADP       1.41
     8501     NASDAQ      ADP       1.39
     8502     NASDAQ      ADP        1.4
     8503     NASDAQ      ADP       1.41
     8504     NASDAQ      ADP       1.45
     8505     NASDAQ      ADP       1.44
     8508     NASDAQ      ADP       1.43
     8509     NASDAQ      ADP        1.4
     8496     NYSE          ADP       1.39
     8497     NYSE          ADP       1.42
     8498     NYSE          ADP       1.41
     8501     NYSE          ADP       1.39
     8502     NYSE          ADP        1.4
     8503     NYSE          ADP       1.41
     8504     NYSE          ADP       1.45
     8505     NYSE          ADP       1.44
     8508     NYSE          ADP       1.43
     8509     NYSE          ADP        1.4

I can identify observations that are duplicated in the --date--,
--symbol--, and --adjclose-- variables using -- duplicates list date
symbol adjclose--, but I'm unsure how to drop the observations from
one specific exchange programmatically.

It doesn't matter which exchange is dropped, as long as all the
observations from that exchange are dropped if the stock appears on
multiple exchanges. Is --duplicates-- the wrong way to go about doing
this? If no simple solution exists, I could always generate a new
variable based on --exchange-- and --symbol-- and use that as a panel
variable.

Thank you,
Aaron
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index