Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: find the corresponding values between two variables


From   Sara Neto Machado <[email protected]>
To   [email protected]
Subject   Re: st: find the corresponding values between two variables
Date   Thu, 17 Apr 2014 01:27:41 +0100

Dear Nick and Joe,

For Nick:
initially I did something similar of what you have proposed
gen diff4 = w_est - w_trab if c_est != c_trab
to understand how much was the difference between the cases that
didn't match. However I found the result to be a lot bigger that was
supposed to, therefore I notice a new problem: the mismatch that Joe
correclty understood from my previous example sample. I know that I
have 28 cases that do not match, however they are all over the
database and I need to identify them.

Taking into account Joe's solution:
I read about the reshape syntax before but I though it woulnd't apply
for my case because it doesn´t allow me to isolate the mismatch cases
instead I need to eliminate those which I do not want to do. However,
I test all your code and an error occur (see result below, please) due
to the exactly 28 mismatch that I state previously and thus I continue
to not know how to isolate them since the 28 missing values appear in
the end which is not correct since they appear all over the
database....

there isn't any other syntax that just do some kind of sorting with
-if- statements to allow to indentify this mismatch? I know that we
have  - sort by - but also does no good for my problem.

STATA RESULT:
reshape wide c_ w_, i(id) j(j) string
(note: j = emp est trab)
j not unique within id;
there are multiple observations at the same j within id.
Type "reshape error" for a listing of the problem observations.
. reshape error
(note: j = emp est trab)

i (id) indicates the top-level grouping such as subject id.
j (j) indicates the subgrouping such as time.
The data are in the long form;  j should be unique within i.

There are multiple observations on the same j within id.

The following 28 of 1049448 observations have repeated j values:

         +-----------+
         | id      j |
         |-----------|
1049421. |  .   trab |
1049422. |  .   trab |
1049423. |  .   trab |
1049424. |  .   trab |
1049425. |  .   trab |
         |-----------|
1049426. |  .   trab |
1049427. |  .   trab |
1049428. |  .   trab |
1049429. |  .   trab |
1049430. |  .   trab |
         |-----------|
1049431. |  .   trab |
1049432. |  .   trab |
1049433. |  .   trab |
1049434. |  .   trab |
1049435. |  .   trab |
         |-----------|
1049436. |  .   trab |
1049437. |  .   trab |
1049438. |  .   trab |
1049439. |  .   trab |
1049440. |  .   trab |
         |-----------|
1049441. |  .   trab |
1049442. |  .   trab |
1049443. |  .   trab |
1049444. |  .   trab |
1049445. |  .   trab |
         |-----------|
1049446. |  .   trab |
1049447. |  .   trab |
1049448. |  .   trab |
         +-----------+

(data now sorted by id j)

thanks in advance!
Kind regards,
Sara

2014-04-16 20:00 GMT+01:00 Joe Canner <[email protected]>:
> The problem as I understand it (which may not be correct) is that the entry with c_est==13 is not just a problem for that observation but its presence makes all of the rest of the observations mismatched as well.
>
> What might help here is to -reshape- the dataset so that all (c_est,c_trab) pairs can be identified and all unmatched cases eliminated.  Or, if you'd rather, just reshape again based on the c_ variables and make the mismatches have a missing partner. Something like:
>
> gen id=_n
> reshape long w_ c_, i(id) j(j) string
> replace id=c_
> reshape wide c_ w_, i(id) j(j) string
>
> This assumes that your c_ variables are unique.  If not, you will have to modify this or do something else entirely.
>
> Regards,
> Joe Canner
> Johns Hopkins University School of Medicine
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Wednesday, April 16, 2014 2:38 PM
> To: [email protected]
> Subject: Re: st: find the corresponding values between two variables
>
> Not clear what you want (or what you tried: you show no code) but does
>
> gen diff = w_est - w_trab if c_est == c_trab
>
> or
>
> gen OK = c_est == c_trab
>
> edit if OK
>
> help?
> Nick
> [email protected]
>
>
> On 16 April 2014 19:31, Sara Neto Machado <[email protected]> wrote:
>> Dear all,
>>
>> c_est----w_est---c_trab----w_trab
>> 10----------10-----------10-------11
>> 11-----------3------------11-------3
>> 13-----------4-------------17-------5
>> 17------------5-------------18-------7
>> 18---------10----------------23-----3
>> 23-----------5----------------25-----6
>>
>> my aim is to perform the differences between w_est and w_trab for the
>> same values of c_est and c_trab. However, I have along the dataset as
>> well as values that do not coincide in c_trab and c_est (Eg from the
>> sample: 13) that "ruins" the sorting between those columms. I want the
>> 13 to appear like missing values on the same line of c_trab. I am
>> trying to search any syntax that suits my purpose and nothing relevant
>> came up. Maybe there are other alternatives that I am not seeing now..
>>
>> Can anyone help me? much appreciated!
>>
>> regards,
>> Sara
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index