Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identify duplicate observations by a varlist, then drop them based on other variables


From   Aaron Kirkman <ak1795mailserv@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Identify duplicate observations by a varlist, then drop them based on other variables
Date   Mon, 1 Oct 2012 13:29:33 -0500

Hi Nick,

That looks like the best solution, so I'll use --duplicates tag--

Thank you,
Aaron

On Sun, Sep 30, 2012 at 8:34 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> I an not clear what advice you seek. If you don't care about
> -exchange-, you have duplicates you can drop, but not otherwise. If
> you -duplicates drop- them, -duplicates- will be indifferent to which
> -exchange- they are.
>
> You can also do this:
>
> duplicates tag date symbol ad , gen(tag)
> drop if tag & exchange == "NASDAQ"
>
> if you have a reason to drop one exchange and not another.
>
> Nick (original author of -duplicates-)
>
> On Mon, Oct 1, 2012 at 2:15 AM, Aaron Kirkman <ak1795mailserv@gmail.com> wrote:
>
>> I have a dataset with about 20 million observations and I'd like to
>> remove duplicate observations from it. However, the observations are
>> only duplicated in the --date--, --symbol--, and --adjclose--
>> variables, not the -exchange- variable, as shown.
>>
>>      date   exchange   symbol   adjclose
>>      8496     NASDAQ      ADP       1.39
>>      8497     NASDAQ      ADP       1.42
>>      8498     NASDAQ      ADP       1.41
>>      8501     NASDAQ      ADP       1.39
>>      8502     NASDAQ      ADP        1.4
>>      8503     NASDAQ      ADP       1.41
>>      8504     NASDAQ      ADP       1.45
>>      8505     NASDAQ      ADP       1.44
>>      8508     NASDAQ      ADP       1.43
>>      8509     NASDAQ      ADP        1.4
>>      8496     NYSE          ADP       1.39
>>      8497     NYSE          ADP       1.42
>>      8498     NYSE          ADP       1.41
>>      8501     NYSE          ADP       1.39
>>      8502     NYSE          ADP        1.4
>>      8503     NYSE          ADP       1.41
>>      8504     NYSE          ADP       1.45
>>      8505     NYSE          ADP       1.44
>>      8508     NYSE          ADP       1.43
>>      8509     NYSE          ADP        1.4
>>
>> I can identify observations that are duplicated in the --date--,
>> --symbol--, and --adjclose-- variables using -- duplicates list date
>> symbol adjclose--, but I'm unsure how to drop the observations from
>> one specific exchange programmatically.
>>
>> It doesn't matter which exchange is dropped, as long as all the
>> observations from that exchange are dropped if the stock appears on
>> multiple exchanges. Is --duplicates-- the wrong way to go about doing
>> this? If no simple solution exists, I could always generate a new
>> variable based on --exchange-- and --symbol-- and use that as a panel
>> variable.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index