Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: drop duplicates iff |

Date |
Tue, 16 Apr 2013 15:08:56 +0100 |

It sounds as if you want bysort ObjektID (day_of_week) : gen day_of_week2 = day_of_week if _n == 1 Nick njcoxstata@gmail.com Terminology is tricky: -duplicates- is a command, not a function. -replace- with missing is not deletion. An observation in Stata is an entire row, case, or record, not an individual value of a variable. On 16 April 2013 14:57, Joel Jönsson <joensson.joel@gmail.com> wrote: > I have the following problem. Im trying to delete observations (replace by [.] ) for values that were filled in automatically when I merged two data sets with different amount of ID observations. These values are duplicates. However I do not wish to use "duplicates drop" since this drops observations containing information in other variables. I can not (as far as I know) control for other variables by adding them to the "duplicate drop" function, since the information in the variable containing most observations is unique for each observation, and must not be droped. > > I tried the following, only to realize that only the first and third duplicate was replaced, leaving the second and fourth duplicate intact. > > replace day_of_week2 =cond(day_of_week2[_n]==day_of_week2[_n-1], .,day_of_week2). This yield > > ObjektID day_of_week day_of_week2 > 3063 5 5 > 3066 3 3 > 3066 3 . > 3066 3 3 > 3066 3 . > 3066 3 3 > 3069 2 2 > > in this case, I would like to have all the 3 removed. Any suggestion how to improve the code? > > Best, > > Joel > > > On Apr 15, 2013, at 11:07 AM, Nick Cox wrote: > >> You don't. >> >> From what you say, you want >> >> duplicates drop apartment_id bidder_id >> >> If that would result in loss of information, -duplicates- will tell >> you. -duplicates- is dedicated to being careful about loss of >> information. >> >> Nick >> njcoxstata@gmail.com >> >> >> On 15 April 2013 09:27, Joel Jönsson <joensson.joel@gmail.com> wrote: >>> Thanks for your quick response Nick. I have been looking at the documentation (help duplicates). >>> My problem is to isolate the removal of duplicates to one Apartment-ID at the time. Which command [if] [in] [bysort] [group] do I use? >>> >>> On Apr 15, 2013, at 1:48 AM, Nick Cox wrote: >>> >>>> Did you try looking at the documentation? There is a -duplicates- >>>> command. Once you have used it to remove duplicates, the second >>>> question is >>>> >>>> bysort Apartment_ID : replace Bidder_ID = _n >>>> >>>> Nick >>>> njcoxstata@gmail.com >>>> >>>> >>>> On 14 April 2013 23:19, Joel Jönsson <joensson.joel@gmail.com> wrote: >>>>> Dear all Statalist users. >>>>> >>>>> I'm quit new to Stata and I'm facing the following challenge. I wish to get rid of duplicates within a >>>>> variable (Bidder-ID) for a specific observation number (Apartment-ID) only i.e. there are numerous >>>>> of observations with the value 49, 50, 51 etc. within Bidder-ID which are allowed only once >>>>> within the same Apartment-ID. >>>>> >>>>> _n Apartment-ID Bidder-ID >>>>> >>>>> 1. 3345 49 >>>>> 2. 3345 49 >>>>> 3. 3345 50 >>>>> 4. 3345 51 >>>>> 5. 3345 50 >>>>> 6. 5780 49 >>>>> 7. 5780 50 >>>>> 8. 5780 49 >>>>> >>>>> I would like the result to look something like the following: >>>>> >>>>> _n Apartment-ID Bidder-ID >>>>> 1. 3345 49 >>>>> 2. 3345 50 >>>>> 3. 3345 51 >>>>> 4. 5780 49 >>>>> 5. 5780 50 >>>>> >>>>> Also, I wish to rename the observations in Bidder-ID (49,50,51) which could also take on numbers >>>>> such as 2234, 2244, 2255 (they symbolize one unique bidder) to take on values equal to when they first >>>>> appeared in Appartment-ID. So, if Bidder-ID 49, 50, 51, 2234, 2244, 2255 exist for the same >>>>> Apartment-ID, then 49=1, 50=2, 51=3, 2234=4 etc., not necessarily in that order (2234=2, 51=1, 49=4 …). >>>>> Thus, It would look something like this: >>>>> >>>>> _n Apartment-ID Bidder-ID >>>>> 1. 3345 1 >>>>> 2. 3345 2 >>>>> 3. 3345 3 >>>>> 4. 5780 1 >>>>> 5. 5780 2 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: drop duplicates iff***From:*Joel Jönsson <joensson.joel@gmail.com>

**Re: st: drop duplicates iff***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: drop duplicates iff***From:*Joel Jönsson <joensson.joel@gmail.com>

**Re: st: drop duplicates iff***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: drop duplicates iff***From:*Joel Jönsson <joensson.joel@gmail.com>

- Prev by Date:
**re: st: quantile-quantile plots** - Next by Date:
**RE: st: RE: Re: xtmixed with log-transfered dependent variable: back to non-log on margins and marginsplot** - Previous by thread:
**Re: st: drop duplicates iff** - Next by thread:
**[no subject]** - Index(es):