Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
pairing unpaired data [was: Re: st: any idea?]

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	pairing unpaired data [was: Re: st: any idea?]
Date	Tue, 7 Jan 2014 18:49:08 +0000
I changed the thread title, which was not informative.

You need a method. Some predictable pitfalls are that for some bones
there is no acceptable match and that others there could be two or
more acceptable matches. I don't think there is a canned solution
independent of your spelling out what the method is.

Nick
[email protected]


On 7 January 2014 18:20, Y.R.E. Retamal <[email protected]> wrote:
> Thank you very much Eric and Nick for the advices.
>
> I will try to give a clearer idea of what want to do:
> For example I have the following database of human bones. I removed missing
> values of length for a better understanding:
>
> id      type    side    length          id      type    side    length
> 1       femur   left    18              21      humerus left    13
> 2       femur   left    65.85           22      humerus left    56
> 3       femur   left    69.1            23      humerus left    92
> 4       femur   left    130             24      humerus left    126
> 5       femur   left    131.2           25      humerus left    154
> 6       femur   left    143             26      humerus left    170
> 7       femur   left    145             27      humerus left    198
> 8       femur   left    160             28      humerus left    228
> 9       femur   left    183             29      humerus left    230
> 10      femur   left    200             30      humerus left    232
> 11      femur   right   28              31      humerus right   238
> 12      femur   right   80              32      humerus right   10
> 13      femur   right   96.5            33      humerus right   66
> 14      femur   right   126             34      humerus right   123
> 15      femur   right   127             35      humerus right   128
> 16      femur   right   128             36      humerus right   143
> 17      femur   right   138             37      humerus right   200
> 18      femur   right   146             38      humerus right   228
> 19      femur   right   148             39      humerus right   230
> 20      femur   right   200             40      humerus right   241
>
> These data belong to a commingled skeletal collection and some right bones
> (femurs and humerus respectively) should match with a left bone, but I do
> not know which bones match. Following the idea that a right bone from a same
> skeleton should have the same length (approximately) with its respective
> left bone, I want to subtract each right femur to each left femur, with the
> aim to find which right femur matches with a left femur, i.e. have the same
> or almost the same length, so the subtraction would be zero or near zero.
> The same proceeding with the humerus (and other bones).
>
> If you have any idea to perform this, please let me know.
>
> Rodrigo
>
>
>
> Best wishes
>
> Rodrigo
>
>
>
>
>
> On 2014-01-05 23:54, Nick Cox wrote:
>>
>> <>
>>
>> Eric Booth gives very good advice.
>>
>> Your problem with the link to the Stata Journal file you were directed
>> to me may be just that you didn't step past the standard material
>> bundled with every reprint file.
>>
>> Nick
>> [email protected]
>>
>>
>> On 5 January 2014 21:03, Eric Booth <[email protected]> wrote:
>>>
>>> <>
>>>
>>> The Stata Journal link you mention that Nick sent you works for me.  The
>>> title of the article is "Stata tip 71: The problem of split identity, or how
>>> to group dyads” by Nick J. Cox, so maybe you can google that title if your
>>> browser isn’t navigating to it properly.
>>>
>>>
>>>
>>> Your example dataset doesn’t align with your desired dataset.
>>>
>>> How do we know what is x and what is j in the first 20 obs of your
>>> example data (see below) (also note the Statalist FAQ about not sending
>>> attachments) ?
>>>
>>> You need some kind of identifier that ties, for example, obs or id 1
>>> (even though it’s missing) to the other right side femur observation of
>>> interest (is it id 7 or id 9 or ??).
>>>
>>>
>>> **your example data:
>>>
>>> id      type    side    length
>>> 1       femur   right
>>> 2       femur   left
>>> 3       femur   right
>>> 4       femur   left
>>> 5       femur   right   373
>>> 6       femur   left    416
>>> 7       femur   right   138
>>> 8       femur   left
>>> 9       femur   right   270
>>> 10      femur   left
>>> 11      femur   left
>>> 12      femur   right
>>> 13      femur   left
>>> 14      femur   right
>>> 15      femur   left    281
>>> 16      femur   right
>>> 17      femur   left    160
>>> 18      femur   left
>>> 19      femur   right
>>> 20      femur   left
>>>
>>>
>>> We can’t just sort by ‘type’ and ‘side’ to get a dataset of the same
>>> structure as you presented initially, so I think you need to provide more
>>> information about this.  (also, if the rule is, as you imply, to sort by
>>> type and side and then subtract every third observation from each other then
>>> what do we do with missing 'length' and missing ‘side’?)
>>>
>>> If the rule is that id 1 and id 2 are a pair then whey does the
>>> left/right ordering suddenly change starting around id 17?
>>>
>>> - Eric
>>>
>>>
>>>
>>>
>>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <[email protected]> wrote:
>>>
>>>> Dear Guys
>>>>
>>>> Some weeks ago, Red Owl and Nick helped me with some loops for my work.
>>>> I have tried to run some suggestion in my dataset, but I had some
>>>> difficulties.
>>>> I give you the basic structure of my dataset and my question:
>>>>
>>>> I want to create some new variables containing the difference between
>>>> the length of two individuals from different groups:
>>>>
>>>> id     side     length      newvar1       newvar2      newvar3
>>>> 1      right      x           x-j           x-k          x-l
>>>> 2      right      y           y-j           y-k          y-l
>>>> 3      right      z           z-j           z-k          z-l
>>>> 4      left       j           j-x           j-y          j-z
>>>> 5      left       k           k-x           k-y          k-z
>>>> 6      left       l           l-x           l-y          l-z
>>>>
>>>> Red Owl suggested me following this example:
>>>>
>>>>>>> *** BEGIN CODE ***
>>>>>>> * Build demo data set.
>>>>>>> clear
>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>> input id str5(side) Length
>>>>>>> 1 right 10
>>>>>>> 2 right 15
>>>>>>> 3 right 11
>>>>>>> 4 left  13
>>>>>>> 5 left  10
>>>>>>> 6 left  12
>>>>>>> end
>>>>>>> gen byte newvar1 = .
>>>>>>> forval i = 1/3 {
>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>  }
>>>>>>> forval i = 4/6 {
>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>  }
>>>>>>> gen byte newvar2 = .
>>>>>>> forval i = 1/3 {
>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>  }
>>>>>>> forval i = 4/6 {
>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>  }
>>>>>>> gen byte newvar3 = .
>>>>>>> forval i = 1/3 {
>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>  }
>>>>>>> forval i = 4/6 {
>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>  }
>>>>>>> list, noobs sep(0)
>>>>>>> *** END CODE ***
>>>>
>>>>
>>>> However, my dataset is much more longer and is difficult to perform it.
>>>> I hope you can help me giving me more ideas.
>>>> I send you an extract of my dataset in .xlsx format
>>>> Also, the webpage suggested by Nick to review the discussion about the
>>>> topic (http://www.stata-journal.com/sjpdf.html?articlenum=dm0043) redirects
>>>> me to a non-sense file to download. Please give me the number of the journal
>>>> to read the discussion.
>>>>
>>>> Happy new year to all of you
>>>>
>>>> Rodrigo
>>>>
>>>>
>>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote:
>>>>>
>>>>> Dear Red Owl and Nick
>>>>> Thank you very much for your response. The code works perfectly, just
>>>>> as I need.
>>>>> Best wishes
>>>>> Rodrigo
>>>>> On 2013-12-14 22:31, Nick Cox wrote:
>>>>>>
>>>>>> In addition to Red's helpful suggestions, note that technique for such
>>>>>> paired data was discussed in
>>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0043
>>>>>> which is publicly accessible. The problem is that the identifiers in
>>>>>> Rodrigo's example appear to make little sense. How is Stata expected
>>>>>> to know that 1 and 4, 2 and 5, 3 and 6 are paired? Perhaps the
>>>>>> structure of the dataset is clearer in practice. If so, basic
>>>>>> calculations are just a couple of lines or so.
>>>>>> Nick
>>>>>> [email protected]
>>>>>> On 14 December 2013 15:33, Red Owl <[email protected]> wrote:
>>>>>>>
>>>>>>> Rodrigo,
>>>>>>> The following code demonstrates an approach with basic loops.
>>>>>>> It could be made more efficient with a different loop
>>>>>>> structure, but this approach may be more informative.
>>>>>>> *** BEGIN CODE ***
>>>>>>> * Build demo data set.
>>>>>>> clear
>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>> input id str5(side) Length
>>>>>>> 1 right 10
>>>>>>> 2 right 15
>>>>>>> 3 right 11
>>>>>>> 4 left  13
>>>>>>> 5 left  10
>>>>>>> 6 left  12
>>>>>>> end
>>>>>>> gen byte newvar1 = .
>>>>>>> forval i = 1/3 {
>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>  }
>>>>>>> forval i = 4/6 {
>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>  }
>>>>>>> gen byte newvar2 = .
>>>>>>> forval i = 1/3 {
>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>  }
>>>>>>> forval i = 4/6 {
>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>  }
>>>>>>> gen byte newvar3 = .
>>>>>>> forval i = 1/3 {
>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>  }
>>>>>>> forval i = 4/6 {
>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>  }
>>>>>>> list, noobs sep(0)
>>>>>>> *** END CODE ***
>>>>>>> Good luck.
>>>>>>> Red Owl
>>>>>>> [email protected]
>>>>>>>>
>>>>>>>> Y.R.E. Retamal" <[email protected]> Sat, 14 Dec 2013 12:08:42:
>>>>>>>> Dear list
>>>>>>>> I am very complicated trying to perform an analysis using STATA and
>>>>>>>> I
>>>>>>>
>>>>>>> cannot find the way. Maybe you could help me. I want to create some
>>>>>>> new
>>>>>>> variables containing the difference between the length of two
>>>>>>> individuals from different groups:
>>>>>>>>
>>>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>>> I do not know if I do explain myself clearly, the individuals are
>>>>>>>
>>>>>>> bones (clavicles, for example), so it is possible that some right
>>>>>>> clavicles pair-match with left clavicles, following the idea that an
>>>>>>> individual has bone of similar length.
>>>>>>>>
>>>>>>>> Any help could bring me a light!
>>>>>>>> Best wishes
>>>>>>>> Rodrigo
>>>>>>>
>>>>>>> *
>>>>>>> *   For searches and help try:
>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> <example.xlsx>
>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: "Y.R.E. Retamal" <[email protected]>
Prev by Date: Re: st: random forest algorithm in Stata?
Next by Date: Re: st: Census/Demographics Datasets
Previous by thread: st: Non-working SROOT options
Next by thread: Re: pairing unpaired data [was: Re: st: any idea?]
Index(es):
- Date
- Thread