Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: drop duplicates iff


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: drop duplicates iff
Date   Tue, 16 Apr 2013 15:08:56 +0100

It sounds as if you want

bysort ObjektID (day_of_week) : gen day_of_week2 = day_of_week if _n == 1

Nick
njcoxstata@gmail.com

Terminology is tricky:

-duplicates- is a command, not a function.

-replace- with missing is not deletion.

An observation in Stata is an entire row, case, or record, not an
individual value of a variable.

On 16 April 2013 14:57, Joel Jönsson <joensson.joel@gmail.com> wrote:

> I have the following problem. Im trying to delete observations (replace by [.] ) for values that were filled in automatically when I merged two data sets with different amount of ID observations. These values are duplicates. However I do not wish to use "duplicates drop" since this drops observations containing information in other variables. I can not (as far as I know) control for other variables by adding them to the "duplicate drop" function, since the information in the variable containing most observations is unique for each observation, and must not be droped.
>
> I tried the following, only to realize that only the first and third duplicate was replaced, leaving the second and fourth duplicate intact.
>
> replace day_of_week2 =cond(day_of_week2[_n]==day_of_week2[_n-1], .,day_of_week2). This yield
>
> ObjektID                day_of_week     day_of_week2
> 3063            5                               5
> 3066            3                               3
> 3066            3                               .
> 3066            3                               3
> 3066            3                               .
> 3066            3                               3
> 3069            2                               2
>
> in this case, I would like to have all the 3 removed. Any suggestion how to improve the code?
>
> Best,
>
> Joel
>
>
> On Apr 15, 2013, at 11:07 AM, Nick Cox wrote:
>
>> You don't.
>>
>> From what you say, you want
>>
>> duplicates drop apartment_id bidder_id
>>
>> If that would result in loss of information, -duplicates- will tell
>> you. -duplicates- is dedicated to  being careful about loss of
>> information.
>>
>> Nick
>> njcoxstata@gmail.com
>>
>>
>> On 15 April 2013 09:27, Joel Jönsson <joensson.joel@gmail.com> wrote:
>>> Thanks for your quick response Nick. I have been looking at the documentation (help duplicates).
>>> My problem is to isolate the removal of duplicates to one Apartment-ID at the time. Which command [if] [in] [bysort] [group] do I use?
>>>
>>> On Apr 15, 2013, at 1:48 AM, Nick Cox wrote:
>>>
>>>> Did you try looking at the documentation? There is a -duplicates-
>>>> command. Once you have used it to remove duplicates, the second
>>>> question is
>>>>
>>>> bysort Apartment_ID : replace Bidder_ID = _n
>>>>
>>>> Nick
>>>> njcoxstata@gmail.com
>>>>
>>>>
>>>> On 14 April 2013 23:19, Joel Jönsson <joensson.joel@gmail.com> wrote:
>>>>> Dear all Statalist users.
>>>>>
>>>>> I'm quit new to Stata and I'm facing the following challenge. I wish to get rid of duplicates within a
>>>>> variable (Bidder-ID) for a specific observation number (Apartment-ID) only i.e. there are numerous
>>>>> of observations with the value 49, 50, 51 etc. within Bidder-ID which are allowed only once
>>>>> within the same Apartment-ID.
>>>>>
>>>>> _n              Apartment-ID    Bidder-ID
>>>>>
>>>>> 1.              3345                    49
>>>>> 2.              3345                    49
>>>>> 3.              3345                    50
>>>>> 4.              3345                    51
>>>>> 5.              3345                    50
>>>>> 6.              5780                    49
>>>>> 7.              5780                    50
>>>>> 8.              5780                    49
>>>>>
>>>>> I would like the result to look something like the following:
>>>>>
>>>>> _n              Apartment-ID    Bidder-ID
>>>>> 1.              3345                    49
>>>>> 2.              3345                    50
>>>>> 3.              3345                    51
>>>>> 4.              5780                    49
>>>>> 5.              5780                    50
>>>>>
>>>>> Also, I wish to rename the observations in Bidder-ID (49,50,51) which could also take on numbers
>>>>> such as 2234, 2244, 2255 (they symbolize one unique bidder) to take on values equal to when they first
>>>>> appeared in Appartment-ID. So, if Bidder-ID 49, 50, 51, 2234, 2244, 2255 exist for the same
>>>>> Apartment-ID, then 49=1, 50=2, 51=3, 2234=4 etc., not necessarily in that order (2234=2, 51=1, 49=4 …).
>>>>> Thus, It would look something like this:
>>>>>
>>>>> _n             Apartment-ID    Bidder-ID
>>>>> 1.              3345                    1
>>>>> 2.              3345                    2
>>>>> 3.              3345                    3
>>>>> 4.              5780                    1
>>>>> 5.              5780                    2

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index