Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: different result each time


From   "Eva Poen" <eva.poen@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: different result each time
Date   Thu, 25 Sep 2008 14:40:39 +0100

Ah. Now things become clear.

You don't want to do a many-to-many merge; as both Kit and Neil have
pointed out, you'll get bizarre results. Also, it doesn't do what you
want anyway.

-joinby- should do the trick. It forms all pairwise combinations of
observations, within groups. See example below. Afterwards, you can
drop all cases where dealdate does not meet your criterion.

Hope this helps,
Eva

******************
tempfile d1 d2

clear
set obs 4
input co anndate
1 7
1 8
2 11
2 13

sort co anndate
list
save `d1'

clear
set obs 10
input co dealdate
1 1
1 2
1 3
1 10
1 15
2 2
2 4
2 10
2 12
2 17

sort co dealdate
list
save `d2'

joinby co using `d1'
sort co anndate dealdate
list co anndate dealdate
**************************


2008/9/25 Rajesh Tharyan <R.Tharyan@exeter.ac.uk>:
> Hi all,
>
> Thanks for the pointers.  The merge is a nightmare
>
>
> I have one dataset like this
>
>
> Co1 anndate1
> Co1 anndate2
> Co2 anndate1
> Co3 anndate1
>
> In this dataset each co anndate combination is unique
>
> And the other as
>
> Co1 dealdate1
> Co1 dealdate2
> Co1 dealdate3
> Co1 dealdate4
> Co1 dealdate5
> Co1 dealdate6
> Co2 dealdate2
> Co2 dealdate3
> Co2 dealdate4
>
> In this dataset each co dealdate combination is unique i.e. I don't have for
> example two co1 dealdate1 etc.
>
> The idea of the merge is that for each co anndate combination I need to get
> all the dealdates with x days before and after the anndate. So I do a many
> to many merge and drop all deal dates that fall outside this range. I am not
> sure how to do this apart from a many to many merge. Is there another way?
>
>
> Rajesh Tharyan
> Tel: +44 (0)1392 262544
> Fax: +44 (0)1392 262475
> E-mail: r.tharyan@exeter.ac.uk
>
> Times Higher Education University of the Year 2007/08
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Eva Poen
> Sent: 25 September 2008 12:42
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: different result each time
>
> Wouldn't you need the latter of the cases you mentioned (merge
> variables don't uniquely identify observations in both datasets) for
> this to happen? I think you'd get the same result if you have
> uniqueness in one dataset. I made up a little example. The second
> merge produces different results in both runs, while the first merge
> produces identical results.
>
> From what -help merge- says I gather that, in the case where
> -uniqusing- applies, any duplicates in the master dataset are filled
> up with the using information (which is unique), so the sort order of
> the master shouldn't matter. (The same applies for -uniqmaster- with
> the terms reversed.) Please correct me if I got this wrong.
>
> Eva
>
> ****************
> tempfile auto11 auto12 auto13 auto21 auto22 auto23 res11 res12 res21 res22
>
> forvalues i = 1/2 {
>    ** sample data 1
>    sysuse auto, clear
>    qui keep in 1/6
>    gen idt = _n
>    qui expand 2
>    sort idt rep78
>    gen id1 = _n
>
>    drop weight
>    qui save `auto`i'1'
>
>    ** sample data 2
>    sysuse auto, clear
>    qui keep in 1/5
>    gen idt = _n
>    keep rep78 weight idt
>    sort idt
>    qui save `auto`i'2'
>
>    ** merge 1 and 2:
>    use `auto`i'1'
>    * idt uniquely identifies obs in using
>    merge idt using `auto`i'2'
>    tab _merge
>    * save result
>    sort idt id1 make
>    qui save `res`i'1'
>
>    ** sample data 3
>    sysuse auto, clear
>    qui keep in 1/5
>    keep rep78 weight
>
>    qui expand 3
>    sort rep78
>    gen id3 = _n
>    qui save `auto`i'3'
>
>    ** merge 1 and 3:
>    use `auto`i'1'
>    sort rep78
>    * rep78 does not uniquely identify obs in either dataset
>    merge rep78 using `auto`i'3'
>    tab _merge
>    * save result
>    sort id1 id3 make
>    qui save `res`i'2'
>
> }
>
> ** compare both runs of first merge:
> use `res11', clear
> cf _all using `res21'
> * they are the same.
>
> ** compare both runs of second merge:
> * they differ.
> use `res12', clear
> cf _all using `res22'
>
> ****************
>
>
> 2008/9/25 Neil Shephard <nshephard@nhs.net>:
>> Your different results may be down to the -merge- you are performing if
> one
>> (or indeed both) of the datasets that you are merging does not have
> uniquely
>> identifiable observations based on the variable you are merging on, so
> check
>> the output after the merge very carefully.
>>
>>
>> Neil
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index