Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identify duplicate observations by a varlist, then drop them based on other variables


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Identify duplicate observations by a varlist, then drop them based on other variables
Date   Mon, 1 Oct 2012 02:34:06 +0100

I an not clear what advice you seek. If you don't care about
-exchange-, you have duplicates you can drop, but not otherwise. If
you -duplicates drop- them, -duplicates- will be indifferent to which
-exchange- they are.

You can also do this:

duplicates tag date symbol ad , gen(tag)
drop if tag & exchange == "NASDAQ"

if you have a reason to drop one exchange and not another.

Nick (original author of -duplicates-)

On Mon, Oct 1, 2012 at 2:15 AM, Aaron Kirkman <ak1795mailserv@gmail.com> wrote:

> I have a dataset with about 20 million observations and I'd like to
> remove duplicate observations from it. However, the observations are
> only duplicated in the --date--, --symbol--, and --adjclose--
> variables, not the -exchange- variable, as shown.
>
>      date   exchange   symbol   adjclose
>      8496     NASDAQ      ADP       1.39
>      8497     NASDAQ      ADP       1.42
>      8498     NASDAQ      ADP       1.41
>      8501     NASDAQ      ADP       1.39
>      8502     NASDAQ      ADP        1.4
>      8503     NASDAQ      ADP       1.41
>      8504     NASDAQ      ADP       1.45
>      8505     NASDAQ      ADP       1.44
>      8508     NASDAQ      ADP       1.43
>      8509     NASDAQ      ADP        1.4
>      8496     NYSE          ADP       1.39
>      8497     NYSE          ADP       1.42
>      8498     NYSE          ADP       1.41
>      8501     NYSE          ADP       1.39
>      8502     NYSE          ADP        1.4
>      8503     NYSE          ADP       1.41
>      8504     NYSE          ADP       1.45
>      8505     NYSE          ADP       1.44
>      8508     NYSE          ADP       1.43
>      8509     NYSE          ADP        1.4
>
> I can identify observations that are duplicated in the --date--,
> --symbol--, and --adjclose-- variables using -- duplicates list date
> symbol adjclose--, but I'm unsure how to drop the observations from
> one specific exchange programmatically.
>
> It doesn't matter which exchange is dropped, as long as all the
> observations from that exchange are dropped if the stock appears on
> multiple exchanges. Is --duplicates-- the wrong way to go about doing
> this? If no simple solution exists, I could always generate a new
> variable based on --exchange-- and --symbol-- and use that as a panel
> variable.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index