[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merging datasets and getting different N in resulting dataset if I run several times

From	"Michael I. Lichter" <[email protected]>
To	[email protected]
Subject	Re: st: merging datasets and getting different N in resulting dataset if I run several times
Date	Fri, 14 Aug 2009 12:09:58 -0400

Woolton,

It would be helpful if you could specify what you want the resultingdataset to look like. Do you want a record for every matching pair ofrecords, as you would get with -joinby- or do you want something else?It would also be helpful if you could show your exact command syntax.


Michael

Nick Cox wrote:

If as you say the id variables do not uniquely identify observations in either dataset, it does not seem at all surprising that you get results like this. The assertion that there is be a bug, on no firm evidence whatsoever, is unconvincing in these circumstances.Either way, better advice might be forthcoming if you showed (a portion of) data for which you get results that puzzle you.Nick[email protected]

Woolton Lee

I am using 6 id variables to merge, all of which were character and I
have now converted them to numeric.  Even after I switch to stable
sorting and numeric id variables the problem persists though it is
smaller in magnitude - previously I was getting differences of 5-9 obs
in the resulting dataset between different runs, now its more like 1.
The id variables do not uniquely identify observations in the master
or using dataset.  Having said that it seems to me that this should
not occur at all even if there is duplication, and I am at a loss to
understand why its occurring.  A colleague of mine suggested there is
a bug in the merge command, I am using STATA 9.

On Fri, Aug 14, 2009 at 10:44 AM, Austin Nichols<[email protected]> wrote:

Probably due to unstable sorting; without further info, hard to diagnose.
Do you have missing values in any of the merge vars?
This is a potentially very serious problem; see e.g.
#4 in http://www.princeton.edu/~jrothst/hoxby/rejoinder.pdf

On Fri, Aug 14, 2009 at 10:28 AM, Woolton Lee<[email protected]> wrote:

Hi I am getting a problem where I am merging two datasets together and
the N in the resulting dataset can change if I rerun the program 2 or
more times.  I am merging by company code (COCODE) and year which do
not uniquely identify observations in the using dataset, but it seems
to me that that should not matter.  I get the same result if I use the
joinby command - the resulting N in the dataset changes if I rerun the
program.  I am trying to understand why this might happen and am
stumped at the moment.  Does anyone have any suggestions?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


--
Michael I. Lichter, Ph.D. <[email protected]>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: merging datasets and getting different N in resulting dataset if I run several times
  - From: Woolton Lee <[email protected]>
- Re: st: merging datasets and getting different N in resulting dataset if I run several times
  - From: Austin Nichols <[email protected]>
- Re: st: merging datasets and getting different N in resulting dataset if I run several times
  - From: Woolton Lee <[email protected]>
- RE: st: merging datasets and getting different N in resulting dataset if I run several times
  - From: "Nick Cox" <[email protected]>

Prev by Date: Re: st: paired t-test on matched sample after propensity score matching
Next by Date: RE: Re: st: Principal Components Analysis with count data
Previous by thread: RE: st: merging datasets and getting different N in resulting dataset if I run several times
Next by thread: st: Effects coding and nested logit in Stata
Index(es):
- Date
- Thread