Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Droping rows in the other data set


From   Eilya Torshizian <e.torshizian@auckland.ac.nz>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Droping rows in the other data set
Date   Mon, 10 Feb 2014 19:30:25 +0000

Hi Nick,

Thanks for your help. So, I will stick to the -merge- command. Yes, you are right about the $list and the scalar as I have forgotten to add them to the posted code.

Kind regards,
Eilya.

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Tuesday, 11 February 2014 1:14 a.m.
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Droping rows in the other data set

I guess the -egen- route could be made to work for very small datasets. For large datasets, you are asking to fit lots of identifiers into an option argument. That's likely to hit some limit or another. -merge- really is the way to go.

Incidentally,

1. if you have a local macro LIST, references to it should be $LIST.
2. nothing in your code qualifies as a scalar in Stata terms.

Nick
njcoxstata@gmail.com

On 9 February 2014 22:55, Eilya Torshizian <e.torshizian@auckland.ac.nz> wrote:

> Thanks for your prompt reply. I was thinking of using merge; however, my code didn't work well. Yours is perfect.
>
> For the future reference, there is a typo in the code,
>
> ...
> merge m:1 id using id1
> ...
>
> BTW, I am curious to know if it's possible to implement the other strategy or not (i.e. by using the 'egen' command)? The main issue is to change the scalar's format.

Rich Goldstein

> if I understand you correctly the following appears to be a much 
> easier strategy
>
> use data1
> keep id
> duplicates drop
> sort id
> save id1
> us data2
> sort id
> merge m:1 is using id1
> drop if _merge==1
> save data2a
>
> of course you need to substitute your own data set names, etc.
>
> also, you were not clear about what you want to do with any id that is 
> in data1 but not in data2 (these will have _merge=2)

Eilya Torshizian

>>I have two datasets. I would like to delete the rows in the Second 
>>data set that are not included in the First data set. The "ID" 
>>variable is included in both datasets with repeating values. However, 
>>some rows of the "ID" are dropped in the First data set. Let's assume 
>>that the "ID" variable in the First dataset is as follows,
>>First:
>> ID
>>11
>>11
>>13
>>15
>>While the "ID" variable does not include 12 and 14 values in the First data set, in the Second dataset 12 and 14 are included:
>>Second:
>>ID
>>11
>>12
>>12
>>13
>>14
>>14
>>15
>>
>>I need to delete the incompatible rows from the Second data set.
>>
>>I was thinking of using the 'egen - anymatch' command. To do so, I 
>>need the list of values from the First data set, which is derived from 
>>the following command,
>>
>>- use "First.dta", clear
>>- levelsof ID
>>- global LIST `r(levels)'
>>- clear
>>
>>Then I use the LIST scalar in the second data set:
>>
>>- use "Second.dta"
>>- egen KEEP = anymatch(ID), values(LIST)
>>- drop if KEEP == 0
>>
>>However, as the LIST is scalar, I am not able to do so. I appreciate your comments.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index