Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: drop duplicates iff

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: drop duplicates iff
Date	Tue, 16 Apr 2013 15:08:56 +0100

It sounds as if you want

bysort ObjektID (day_of_week) : gen day_of_week2 = day_of_week if _n == 1

Nick
[email protected]

Terminology is tricky:

-duplicates- is a command, not a function.

-replace- with missing is not deletion.

An observation in Stata is an entire row, case, or record, not an
individual value of a variable.

On 16 April 2013 14:57, Joel Jönsson <[email protected]> wrote:

> I have the following problem. Im trying to delete observations (replace by [.] ) for values that were filled in automatically when I merged two data sets with different amount of ID observations. These values are duplicates. However I do not wish to use "duplicates drop" since this drops observations containing information in other variables. I can not (as far as I know) control for other variables by adding them to the "duplicate drop" function, since the information in the variable containing most observations is unique for each observation, and must not be droped.
>
> I tried the following, only to realize that only the first and third duplicate was replaced, leaving the second and fourth duplicate intact.
>
> replace day_of_week2 =cond(day_of_week2[_n]==day_of_week2[_n-1], .,day_of_week2). This yield
>
> ObjektID                day_of_week     day_of_week2
> 3063            5                               5
> 3066            3                               3
> 3066            3                               .
> 3066            3                               3
> 3066            3                               .
> 3066            3                               3
> 3069            2                               2
>
> in this case, I would like to have all the 3 removed. Any suggestion how to improve the code?
>
> Best,
>
> Joel
>
>
> On Apr 15, 2013, at 11:07 AM, Nick Cox wrote:
>
>> You don't.
>>
>> From what you say, you want
>>
>> duplicates drop apartment_id bidder_id
>>
>> If that would result in loss of information, -duplicates- will tell
>> you. -duplicates- is dedicated to  being careful about loss of
>> information.
>>
>> Nick
>> [email protected]
>>
>>
>> On 15 April 2013 09:27, Joel Jönsson <[email protected]> wrote:
>>> Thanks for your quick response Nick. I have been looking at the documentation (help duplicates).
>>> My problem is to isolate the removal of duplicates to one Apartment-ID at the time. Which command [if] [in] [bysort] [group] do I use?
>>>
>>> On Apr 15, 2013, at 1:48 AM, Nick Cox wrote:
>>>
>>>> Did you try looking at the documentation? There is a -duplicates-
>>>> command. Once you have used it to remove duplicates, the second
>>>> question is
>>>>
>>>> bysort Apartment_ID : replace Bidder_ID = _n
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 14 April 2013 23:19, Joel Jönsson <[email protected]> wrote:
>>>>> Dear all Statalist users.
>>>>>
>>>>> I'm quit new to Stata and I'm facing the following challenge. I wish to get rid of duplicates within a
>>>>> variable (Bidder-ID) for a specific observation number (Apartment-ID) only i.e. there are numerous
>>>>> of observations with the value 49, 50, 51 etc. within Bidder-ID which are allowed only once
>>>>> within the same Apartment-ID.
>>>>>
>>>>> _n              Apartment-ID    Bidder-ID
>>>>>
>>>>> 1.              3345                    49
>>>>> 2.              3345                    49
>>>>> 3.              3345                    50
>>>>> 4.              3345                    51
>>>>> 5.              3345                    50
>>>>> 6.              5780                    49
>>>>> 7.              5780                    50
>>>>> 8.              5780                    49
>>>>>
>>>>> I would like the result to look something like the following:
>>>>>
>>>>> _n              Apartment-ID    Bidder-ID
>>>>> 1.              3345                    49
>>>>> 2.              3345                    50
>>>>> 3.              3345                    51
>>>>> 4.              5780                    49
>>>>> 5.              5780                    50
>>>>>
>>>>> Also, I wish to rename the observations in Bidder-ID (49,50,51) which could also take on numbers
>>>>> such as 2234, 2244, 2255 (they symbolize one unique bidder) to take on values equal to when they first
>>>>> appeared in Appartment-ID. So, if Bidder-ID 49, 50, 51, 2234, 2244, 2255 exist for the same
>>>>> Apartment-ID, then 49=1, 50=2, 51=3, 2234=4 etc., not necessarily in that order (2234=2, 51=1, 49=4 …).
>>>>> Thus, It would look something like this:
>>>>>
>>>>> _n             Apartment-ID    Bidder-ID
>>>>> 1.              3345                    1
>>>>> 2.              3345                    2
>>>>> 3.              3345                    3
>>>>> 4.              5780                    1
>>>>> 5.              5780                    2

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: drop duplicates iff
  - From: Joel Jönsson <[email protected]>
- Re: st: drop duplicates iff
  - From: Nick Cox <[email protected]>
- Re: st: drop duplicates iff
  - From: Joel Jönsson <[email protected]>
- Re: st: drop duplicates iff
  - From: Nick Cox <[email protected]>
- Re: st: drop duplicates iff
  - From: Joel Jönsson <[email protected]>

Prev by Date: re: st: quantile-quantile plots
Next by Date: RE: st: RE: Re: xtmixed with log-transfered dependent variable: back to non-log on margins and marginsplot
Previous by thread: Re: st: drop duplicates iff
Next by thread: [no subject]
Index(es):
- Date
- Thread