Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: unstable results with repeating the nearmrg command


From   John Hund <[email protected]>
To   [email protected]
Subject   st: Re: unstable results with repeating the nearmrg command
Date   Tue, 11 Aug 2009 07:11:55 -0500

Thanks Martin...

It actually does have to do with the stable option. Right after the first append in the .ado file, the appended data actually could have (and in this case does have) duplicates for any exact matches, so the sort command is ambiguous. Changing the lines:

append using `work'
sort `fullvars'

to

append using `work'
sort `fullvars', stable

fixes the problem. I'd encourage anyone who uses this to make the change!

Thanks again,
John
=================================
John Hund
Visiting Assistant Professor
Jones Graduate School of Business
Rice University
Houston, TX 77005



On Aug 10, 2009, at 4:02 PM, John Hund wrote:


I am having a very perplexing problem with the nearmrg command...it seems to give different results on subsequent runs with the same data. In addition, my co-author and I get different results on the the same datasets, similarly sorted. An example of the problem is below, using a very small (5 observation) dataset. The two datasets are ageinfo1 and ageinfo2:

ageinfo1
     +----------------------------+
     | id   gender   age   income |
     |----------------------------|
  1. |  4        1    12       56 |
  2. |  3        1    25       21 |
  3. |  1        1    34       23 |
  4. |  5        2    18       75 |
  5. |  2        2    40       43 |
     +----------------------------+
Note that ageinfo1 is sorted by gender and age, and doesn't contain any duplicate values.

ageinfo2
     +-----------------------------+
     |  id   gender   income   age |
     |-----------------------------|
  1. | 415        1       12    12 |
  2. | 314        1       32    25 |
  3. | 516        2       65    18 |
  4. | 213        2       32    40 |
  5. |  12        2       12    34 |
     +-----------------------------+
Not necessary to be sorted, but I subsequently sort this file to facilitate replication. Then issuing the following commands in order gives:

. use ageinfo2

. sort gender age

. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)

. list

     +-----------------------------------------------+
     |  id   gender   income   age   _merge   newage |
     |-----------------------------------------------|
  1. | 415        1       12    12        3       12 |
  2. | 314        1       32    25        3       25 |
  3. | 516        2       65    18        3       18 |
  4. |  12        2       12    34        3       18 |
  5. | 213        2       32    40        3       40 |
     +-----------------------------------------------+

. clear

. use ageinfo2

. sort gender age

. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)

. list

     +-----------------------------------------------+
     |  id   gender   income   age   _merge   newage |
     |-----------------------------------------------|
  1. | 415        1       12    12        3       12 |
  2. | 314        1       32    25        3       12 |
  3. | 516        2       65    18        3       18 |
  4. |  12        2       12    34        3       18 |
  5. | 213        2       32    40        3       40 |
     +-----------------------------------------------+

. clear

. use ageinfo2

. sort gender age

. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)

. list

     +-----------------------------------------------+
     |  id   gender   income   age   _merge   newage |
     |-----------------------------------------------|
  1. | 314        1       32    25        3       25 |
  2. | 415        1       12    12        1        . |
  3. | 516        2       65    18        3       18 |
  4. |  12        2       12    34        3       18 |
  5. | 213        2       32    40        3       40 |
     +-----------------------------------------------+

The first outcome is correct, but subsequent runs give different (and incorrect) answers. My only guess at this point is that there is something going on with a temporary file which is not being cleared, but I don't know how that could happen. Has anyone else noticed a problem with this?

Thanks in advance,
John
=================================
John Hund
Visiting Assistant Professor
Jones Graduate School of Business
Rice University
Houston, TX 77005

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index