Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Duplicate observations


From   emanuele mazzini <[email protected]>
To   [email protected]
Subject   Re: st: RE: Duplicate observations
Date   Tue, 11 Mar 2014 00:09:41 +0100

I figured it out, thank you very much!

Emanuele

2014-03-10 20:02 GMT+01:00 Nick Cox <[email protected]>:
> Joe is right.
>
> Away from -gsort- a minus sign in a varlist acts as a hyphen,
>
> Good catch!
> Nick
> [email protected]
>
>
> On 10 March 2014 18:58, Joe Canner <[email protected]> wrote:
>> Emanuele,
>>
>> Nick provided a good solution to your problem, but it's probably worth noting why you had a problem to begin with.
>>
>> The statement:
>>
>> by reporter partner year (x_1 -date), sort: gen duplicates=_n
>>
>> is probably not doing what you want it to do.  It looks like you want to sort by x_1 (ascending) and date (descending).  However, as far as I am aware, the minus sign to indicate a descending sort can only be used in a -gsort- command.  In this context the minus is sign is interpreted as a hyphen and thus "x_1 -date" is a variable list (variables x_1 through date).  Accordingly, it is not sorting in descending date order, which results in the problem you noted.
>>
>> If you need to do something like this in the future and Nick's solution doesn't apply, try the following:
>>
>> gsort reporter partner year x_1 -date
>> bysort reporter partner year:  gen duplicates=_n
>>
>> Regards,
>> Joe Canner
>> Johns Hopkins University School of Medicine
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of emanuele mazzini
>> Sent: Monday, March 10, 2014 2:31 PM
>> To: [email protected]
>> Subject: st: Duplicate observations
>>
>> Hello to everybody,
>>
>> I have an issue about duplicate observations that I find puzzling to solve.
>> I have data on country-pairs by year and I am interested in two
>> specific variables, a date and, say a variable which I call x_1.
>>
>> Specifically, my data look like this :
>>
>> reporter  partner   year       date         x_1
>>
>> Albania  Austria   1980   6dec1980     n_1
>> Albania  Austria   1980  15nov1980    n_1
>> .         .        .
>> .         .        .
>> .         .        .
>>
>> As you may have noticed observations differ amongst them only by date
>> and I need to drop them so as to keep the most recent one (hence, in
>> this case the second one).
>>
>> I ran the following commands:
>>
>> duplicates tag reporter partner year, generate(dup)
>>
>> by reporter partner year (x_1 -date), sort: gen duplicates=_n
>>
>> so as to be able to identify duplicates and then - among those with
>> dup >0 - drop those for which duplicates > 1.
>> This method was suggested in this thread (I take this opportunity to
>> thank again), but it seems not to work for some observations.
>> Take, for instance the following example:
>>
>> reporter partner    year      date         x_1    dup     duplicates
>> Albania Germany 1967 08apr1967    n_1      1           1
>> Albania Germany 1967 17dec1967   n_1      1           2
>>
>> As you may notice, Stata identifies the observation occurred the
>> 17dec1967 as those with duplicates > 1 (which will then be dropped),
>> while I would have expected Stata to make the opposite.
>>
>> Can anyone explain me why and, possibly, tell me how to deal with such issue?
>>
>> Thank you very much in advance,
>>
>> Emanuele
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index