Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pairing unpaired data [was: Re: st: any idea?]


From   Fernando Rios Avila <[email protected]>
To   [email protected]
Subject   Re: pairing unpaired data [was: Re: st: any idea?]
Date   Tue, 7 Jan 2014 14:37:56 -0500

Rodrigo,
Perhaps a direction you could follow is by using a near matching method.
Since you can separate the information in two datasets (namely left
and right), you can do so, and then "merge" them using the user
written program -nearmrg-.
That will give you a start point to match up your data, but you might
need to make further revisions to ensure that there are no duplicate
matching.
Best

On Tue, Jan 7, 2014 at 2:27 PM, Nick Cox <[email protected]> wrote:
> Thanks for the details of your problem. I can't see that you have a
> method that is translatable into Stata code: your procedure is too
> vaguely specified. That need not stop other people suggesting methods.
> Nick
> [email protected]
>
>
> On 7 January 2014 19:20, Y.R.E. Retamal <[email protected]> wrote:
>> Dear Nick
>>
>> Thanks a lot for your soon response. The method is no more than showed. I
>> have to add other variables like width and height for the same bone. So, if
>> three variables match, probably both bones would be from the same skeleton.
>> I would expect that many bones would not match between them, so I could
>> discard them being from the same skeleton. Problems would appear if e.g. a
>> right bone matches with more than one left bone. But at least I could
>> simplify the work and after I could focus on problematic cases.
>>
>> Rodrigo
>>
>>
>>
>>
>>
>>
>>
>> On 2014-01-07 18:49, Nick Cox wrote:
>>>
>>> I changed the thread title, which was not informative.
>>>
>>> You need a method. Some predictable pitfalls are that for some bones
>>> there is no acceptable match and that others there could be two or
>>> more acceptable matches. I don't think there is a canned solution
>>> independent of your spelling out what the method is.
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 7 January 2014 18:20, Y.R.E. Retamal <[email protected]> wrote:
>>>>
>>>> Thank you very much Eric and Nick for the advices.
>>>>
>>>> I will try to give a clearer idea of what want to do:
>>>> For example I have the following database of human bones. I removed
>>>> missing
>>>> values of length for a better understanding:
>>>>
>>>> id      type    side    length          id      type    side    length
>>>> 1       femur   left    18              21      humerus left    13
>>>> 2       femur   left    65.85           22      humerus left    56
>>>> 3       femur   left    69.1            23      humerus left    92
>>>> 4       femur   left    130             24      humerus left    126
>>>> 5       femur   left    131.2           25      humerus left    154
>>>> 6       femur   left    143             26      humerus left    170
>>>> 7       femur   left    145             27      humerus left    198
>>>> 8       femur   left    160             28      humerus left    228
>>>> 9       femur   left    183             29      humerus left    230
>>>> 10      femur   left    200             30      humerus left    232
>>>> 11      femur   right   28              31      humerus right   238
>>>> 12      femur   right   80              32      humerus right   10
>>>> 13      femur   right   96.5            33      humerus right   66
>>>> 14      femur   right   126             34      humerus right   123
>>>> 15      femur   right   127             35      humerus right   128
>>>> 16      femur   right   128             36      humerus right   143
>>>> 17      femur   right   138             37      humerus right   200
>>>> 18      femur   right   146             38      humerus right   228
>>>> 19      femur   right   148             39      humerus right   230
>>>> 20      femur   right   200             40      humerus right   241
>>>>
>>>> These data belong to a commingled skeletal collection and some right
>>>> bones
>>>> (femurs and humerus respectively) should match with a left bone, but I do
>>>> not know which bones match. Following the idea that a right bone from a
>>>> same
>>>> skeleton should have the same length (approximately) with its respective
>>>> left bone, I want to subtract each right femur to each left femur, with
>>>> the
>>>> aim to find which right femur matches with a left femur, i.e. have the
>>>> same
>>>> or almost the same length, so the subtraction would be zero or near zero.
>>>> The same proceeding with the humerus (and other bones).
>>>>
>>>> If you have any idea to perform this, please let me know.
>>>>
>>>> Rodrigo
>>>>
>>>>
>>>>
>>>> Best wishes
>>>>
>>>> Rodrigo
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2014-01-05 23:54, Nick Cox wrote:
>>>>>
>>>>>
>>>>> <>
>>>>>
>>>>> Eric Booth gives very good advice.
>>>>>
>>>>> Your problem with the link to the Stata Journal file you were directed
>>>>> to me may be just that you didn't step past the standard material
>>>>> bundled with every reprint file.
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 5 January 2014 21:03, Eric Booth <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> <>
>>>>>>
>>>>>> The Stata Journal link you mention that Nick sent you works for me.
>>>>>> The
>>>>>> title of the article is "Stata tip 71: The problem of split identity,
>>>>>> or how
>>>>>> to group dyads” by Nick J. Cox, so maybe you can google that title if
>>>>>> your
>>>>>> browser isn’t navigating to it properly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Your example dataset doesn’t align with your desired dataset.
>>>>>>
>>>>>> How do we know what is x and what is j in the first 20 obs of your
>>>>>> example data (see below) (also note the Statalist FAQ about not sending
>>>>>> attachments) ?
>>>>>>
>>>>>> You need some kind of identifier that ties, for example, obs or id 1
>>>>>> (even though it’s missing) to the other right side femur observation of
>>>>>> interest (is it id 7 or id 9 or ??).
>>>>>>
>>>>>>
>>>>>> **your example data:
>>>>>>
>>>>>> id      type    side    length
>>>>>> 1       femur   right
>>>>>> 2       femur   left
>>>>>> 3       femur   right
>>>>>> 4       femur   left
>>>>>> 5       femur   right   373
>>>>>> 6       femur   left    416
>>>>>> 7       femur   right   138
>>>>>> 8       femur   left
>>>>>> 9       femur   right   270
>>>>>> 10      femur   left
>>>>>> 11      femur   left
>>>>>> 12      femur   right
>>>>>> 13      femur   left
>>>>>> 14      femur   right
>>>>>> 15      femur   left    281
>>>>>> 16      femur   right
>>>>>> 17      femur   left    160
>>>>>> 18      femur   left
>>>>>> 19      femur   right
>>>>>> 20      femur   left
>>>>>>
>>>>>>
>>>>>> We can’t just sort by ‘type’ and ‘side’ to get a dataset of the same
>>>>>> structure as you presented initially, so I think you need to provide
>>>>>> more
>>>>>> information about this.  (also, if the rule is, as you imply, to sort
>>>>>> by
>>>>>> type and side and then subtract every third observation from each other
>>>>>> then
>>>>>> what do we do with missing 'length' and missing ‘side’?)
>>>>>>
>>>>>> If the rule is that id 1 and id 2 are a pair then whey does the
>>>>>> left/right ordering suddenly change starting around id 17?
>>>>>>
>>>>>> - Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <[email protected]> wrote:
>>>>>>
>>>>>>> Dear Guys
>>>>>>>
>>>>>>> Some weeks ago, Red Owl and Nick helped me with some loops for my
>>>>>>> work.
>>>>>>> I have tried to run some suggestion in my dataset, but I had some
>>>>>>> difficulties.
>>>>>>> I give you the basic structure of my dataset and my question:
>>>>>>>
>>>>>>> I want to create some new variables containing the difference between
>>>>>>> the length of two individuals from different groups:
>>>>>>>
>>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>>
>>>>>>> Red Owl suggested me following this example:
>>>>>>>
>>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>>> * Build demo data set.
>>>>>>>>>> clear
>>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>>> input id str5(side) Length
>>>>>>>>>> 1 right 10
>>>>>>>>>> 2 right 15
>>>>>>>>>> 3 right 11
>>>>>>>>>> 4 left  13
>>>>>>>>>> 5 left  10
>>>>>>>>>> 6 left  12
>>>>>>>>>> end
>>>>>>>>>> gen byte newvar1 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar2 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar3 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>>  }
>>>>>>>>>> list, noobs sep(0)
>>>>>>>>>> *** END CODE ***
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> However, my dataset is much more longer and is difficult to perform
>>>>>>> it.
>>>>>>> I hope you can help me giving me more ideas.
>>>>>>> I send you an extract of my dataset in .xlsx format
>>>>>>> Also, the webpage suggested by Nick to review the discussion about the
>>>>>>> topic (http://www.stata-journal.com/sjpdf.html?articlenum=dm0043)
>>>>>>> redirects
>>>>>>> me to a non-sense file to download. Please give me the number of the
>>>>>>> journal
>>>>>>> to read the discussion.
>>>>>>>
>>>>>>> Happy new year to all of you
>>>>>>>
>>>>>>> Rodrigo
>>>>>>>
>>>>>>>
>>>>>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear Red Owl and Nick
>>>>>>>> Thank you very much for your response. The code works perfectly, just
>>>>>>>> as I need.
>>>>>>>> Best wishes
>>>>>>>> Rodrigo
>>>>>>>> On 2013-12-14 22:31, Nick Cox wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In addition to Red's helpful suggestions, note that technique for
>>>>>>>>> such
>>>>>>>>> paired data was discussed in
>>>>>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0043
>>>>>>>>> which is publicly accessible. The problem is that the identifiers in
>>>>>>>>> Rodrigo's example appear to make little sense. How is Stata expected
>>>>>>>>> to know that 1 and 4, 2 and 5, 3 and 6 are paired? Perhaps the
>>>>>>>>> structure of the dataset is clearer in practice. If so, basic
>>>>>>>>> calculations are just a couple of lines or so.
>>>>>>>>> Nick
>>>>>>>>> [email protected]
>>>>>>>>> On 14 December 2013 15:33, Red Owl <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Rodrigo,
>>>>>>>>>> The following code demonstrates an approach with basic loops.
>>>>>>>>>> It could be made more efficient with a different loop
>>>>>>>>>> structure, but this approach may be more informative.
>>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>>> * Build demo data set.
>>>>>>>>>> clear
>>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>>> input id str5(side) Length
>>>>>>>>>> 1 right 10
>>>>>>>>>> 2 right 15
>>>>>>>>>> 3 right 11
>>>>>>>>>> 4 left  13
>>>>>>>>>> 5 left  10
>>>>>>>>>> 6 left  12
>>>>>>>>>> end
>>>>>>>>>> gen byte newvar1 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar2 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar3 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>>  }
>>>>>>>>>> list, noobs sep(0)
>>>>>>>>>> *** END CODE ***
>>>>>>>>>> Good luck.
>>>>>>>>>> Red Owl
>>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Y.R.E. Retamal" <[email protected]> Sat, 14 Dec 2013 12:08:42:
>>>>>>>>>>> Dear list
>>>>>>>>>>> I am very complicated trying to perform an analysis using STATA
>>>>>>>>>>> and
>>>>>>>>>>> I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cannot find the way. Maybe you could help me. I want to create some
>>>>>>>>>> new
>>>>>>>>>> variables containing the difference between the length of two
>>>>>>>>>> individuals from different groups:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>>>>>> I do not know if I do explain myself clearly, the individuals are
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> bones (clavicles, for example), so it is possible that some right
>>>>>>>>>> clavicles pair-match with left clavicles, following the idea that
>>>>>>>>>> an
>>>>>>>>>> individual has bone of similar length.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any help could bring me a light!
>>>>>>>>>>> Best wishes
>>>>>>>>>>> Rodrigo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *
>>>>>>>>>> *   For searches and help try:
>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>> *   For searches and help try:
>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>
>>>>>>>>
>>>>>>>> *
>>>>>>>> *   For searches and help try:
>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>
>>>>>>>
>>>>>>> <example.xlsx>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index