Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: drop duplicates iff
From 
 
Joel Jönsson <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: drop duplicates iff 
Date 
 
Tue, 16 Apr 2013 15:57:35 +0200 
Hi all,
I have the following problem. Im trying to delete observations (replace by [.] ) for values that were filled in automatically when I merged two data sets with different amount of ID observations. These values are duplicates. However I do not wish to use "duplicates drop" since this drops observations containing information in other variables. I can not (as far as I know) control for other variables by adding them to the "duplicate drop" function, since the information in the variable containing most observations is unique for each observation, and must not be droped. 
I tried the following, only to realize that only the first and third duplicate was replaced, leaving the second and fourth duplicate intact. 
replace day_of_week2 =cond(day_of_week2[_n]==day_of_week2[_n-1], .,day_of_week2). This yield 
ObjektID		day_of_week	day_of_week2
3063		5				5
3066		3				3
3066		3				.	
3066		3				3
3066		3				.
3066		3				3
3069		2				2
in this case, I would like to have all the 3 removed. Any suggestion how to improve the code? 
Best,
Joel 
 
On Apr 15, 2013, at 11:07 AM, Nick Cox wrote:
> You don't.
> 
> From what you say, you want
> 
> duplicates drop apartment_id bidder_id
> 
> If that would result in loss of information, -duplicates- will tell
> you. -duplicates- is dedicated to  being careful about loss of
> information.
> 
> Nick
> [email protected]
> 
> 
> On 15 April 2013 09:27, Joel Jönsson <[email protected]> wrote:
>> Thanks for your quick response Nick. I have been looking at the documentation (help duplicates).
>> My problem is to isolate the removal of duplicates to one Apartment-ID at the time. Which command [if] [in] [bysort] [group] do I use?
>> 
>> On Apr 15, 2013, at 1:48 AM, Nick Cox wrote:
>> 
>>> Did you try looking at the documentation? There is a -duplicates-
>>> command. Once you have used it to remove duplicates, the second
>>> question is
>>> 
>>> bysort Apartment_ID : replace Bidder_ID = _n
>>> 
>>> Nick
>>> [email protected]
>>> 
>>> 
>>> On 14 April 2013 23:19, Joel Jönsson <[email protected]> wrote:
>>>> Dear all Statalist users.
>>>> 
>>>> I'm quit new to Stata and I'm facing the following challenge. I wish to get rid of duplicates within a
>>>> variable (Bidder-ID) for a specific observation number (Apartment-ID) only i.e. there are numerous
>>>> of observations with the value 49, 50, 51 etc. within Bidder-ID which are allowed only once
>>>> within the same Apartment-ID.
>>>> 
>>>> _n              Apartment-ID    Bidder-ID
>>>> 
>>>> 1.              3345                    49
>>>> 2.              3345                    49
>>>> 3.              3345                    50
>>>> 4.              3345                    51
>>>> 5.              3345                    50
>>>> 6.              5780                    49
>>>> 7.              5780                    50
>>>> 8.              5780                    49
>>>> 
>>>> I would like the result to look something like the following:
>>>> 
>>>> _n              Apartment-ID    Bidder-ID
>>>> 1.              3345                    49
>>>> 2.              3345                    50
>>>> 3.              3345                    51
>>>> 4.              5780                    49
>>>> 5.              5780                    50
>>>> 
>>>> Also, I wish to rename the observations in Bidder-ID (49,50,51) which could also take on numbers
>>>> such as 2234, 2244, 2255 (they symbolize one unique bidder) to take on values equal to when they first
>>>> appeared in Appartment-ID. So, if Bidder-ID 49, 50, 51, 2234, 2244, 2255 exist for the same
>>>> Apartment-ID, then 49=1, 50=2, 51=3, 2234=4 etc., not necessarily in that order (2234=2, 51=1, 49=4 …).
>>>> Thus, It would look something like this:
>>>> 
>>>> _n             Apartment-ID    Bidder-ID
>>>> 1.              3345                    1
>>>> 2.              3345                    2
>>>> 3.              3345                    3
>>>> 4.              5780                    1
>>>> 5.              5780                    2
>>>> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/