Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <[email protected]> |

To |
"[email protected]" <[email protected]> |

Subject |
pairing unpaired data [was: Re: st: any idea?] |

Date |
Tue, 7 Jan 2014 18:49:08 +0000 |

I changed the thread title, which was not informative. You need a method. Some predictable pitfalls are that for some bones there is no acceptable match and that others there could be two or more acceptable matches. I don't think there is a canned solution independent of your spelling out what the method is. Nick [email protected] On 7 January 2014 18:20, Y.R.E. Retamal <[email protected]> wrote: > Thank you very much Eric and Nick for the advices. > > I will try to give a clearer idea of what want to do: > For example I have the following database of human bones. I removed missing > values of length for a better understanding: > > id type side length id type side length > 1 femur left 18 21 humerus left 13 > 2 femur left 65.85 22 humerus left 56 > 3 femur left 69.1 23 humerus left 92 > 4 femur left 130 24 humerus left 126 > 5 femur left 131.2 25 humerus left 154 > 6 femur left 143 26 humerus left 170 > 7 femur left 145 27 humerus left 198 > 8 femur left 160 28 humerus left 228 > 9 femur left 183 29 humerus left 230 > 10 femur left 200 30 humerus left 232 > 11 femur right 28 31 humerus right 238 > 12 femur right 80 32 humerus right 10 > 13 femur right 96.5 33 humerus right 66 > 14 femur right 126 34 humerus right 123 > 15 femur right 127 35 humerus right 128 > 16 femur right 128 36 humerus right 143 > 17 femur right 138 37 humerus right 200 > 18 femur right 146 38 humerus right 228 > 19 femur right 148 39 humerus right 230 > 20 femur right 200 40 humerus right 241 > > These data belong to a commingled skeletal collection and some right bones > (femurs and humerus respectively) should match with a left bone, but I do > not know which bones match. Following the idea that a right bone from a same > skeleton should have the same length (approximately) with its respective > left bone, I want to subtract each right femur to each left femur, with the > aim to find which right femur matches with a left femur, i.e. have the same > or almost the same length, so the subtraction would be zero or near zero. > The same proceeding with the humerus (and other bones). > > If you have any idea to perform this, please let me know. > > Rodrigo > > > > Best wishes > > Rodrigo > > > > > > On 2014-01-05 23:54, Nick Cox wrote: >> >> <> >> >> Eric Booth gives very good advice. >> >> Your problem with the link to the Stata Journal file you were directed >> to me may be just that you didn't step past the standard material >> bundled with every reprint file. >> >> Nick >> [email protected] >> >> >> On 5 January 2014 21:03, Eric Booth <[email protected]> wrote: >>> >>> <> >>> >>> The Stata Journal link you mention that Nick sent you works for me. The >>> title of the article is "Stata tip 71: The problem of split identity, or how >>> to group dyads” by Nick J. Cox, so maybe you can google that title if your >>> browser isn’t navigating to it properly. >>> >>> >>> >>> Your example dataset doesn’t align with your desired dataset. >>> >>> How do we know what is x and what is j in the first 20 obs of your >>> example data (see below) (also note the Statalist FAQ about not sending >>> attachments) ? >>> >>> You need some kind of identifier that ties, for example, obs or id 1 >>> (even though it’s missing) to the other right side femur observation of >>> interest (is it id 7 or id 9 or ??). >>> >>> >>> **your example data: >>> >>> id type side length >>> 1 femur right >>> 2 femur left >>> 3 femur right >>> 4 femur left >>> 5 femur right 373 >>> 6 femur left 416 >>> 7 femur right 138 >>> 8 femur left >>> 9 femur right 270 >>> 10 femur left >>> 11 femur left >>> 12 femur right >>> 13 femur left >>> 14 femur right >>> 15 femur left 281 >>> 16 femur right >>> 17 femur left 160 >>> 18 femur left >>> 19 femur right >>> 20 femur left >>> >>> >>> We can’t just sort by ‘type’ and ‘side’ to get a dataset of the same >>> structure as you presented initially, so I think you need to provide more >>> information about this. (also, if the rule is, as you imply, to sort by >>> type and side and then subtract every third observation from each other then >>> what do we do with missing 'length' and missing ‘side’?) >>> >>> If the rule is that id 1 and id 2 are a pair then whey does the >>> left/right ordering suddenly change starting around id 17? >>> >>> - Eric >>> >>> >>> >>> >>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <[email protected]> wrote: >>> >>>> Dear Guys >>>> >>>> Some weeks ago, Red Owl and Nick helped me with some loops for my work. >>>> I have tried to run some suggestion in my dataset, but I had some >>>> difficulties. >>>> I give you the basic structure of my dataset and my question: >>>> >>>> I want to create some new variables containing the difference between >>>> the length of two individuals from different groups: >>>> >>>> id side length newvar1 newvar2 newvar3 >>>> 1 right x x-j x-k x-l >>>> 2 right y y-j y-k y-l >>>> 3 right z z-j z-k z-l >>>> 4 left j j-x j-y j-z >>>> 5 left k k-x k-y k-z >>>> 6 left l l-x l-y l-z >>>> >>>> Red Owl suggested me following this example: >>>> >>>>>>> *** BEGIN CODE *** >>>>>>> * Build demo data set. >>>>>>> clear >>>>>>> * Length is capitalized to distinguish from length(). >>>>>>> input id str5(side) Length >>>>>>> 1 right 10 >>>>>>> 2 right 15 >>>>>>> 3 right 11 >>>>>>> 4 left 13 >>>>>>> 5 left 10 >>>>>>> 6 left 12 >>>>>>> end >>>>>>> gen byte newvar1 = . >>>>>>> forval i = 1/3 { >>>>>>> replace newvar1 = Length[`i'] - Length[4] in `i' >>>>>>> } >>>>>>> forval i = 4/6 { >>>>>>> replace newvar1 = Length[`i'] - Length[1] in `i' >>>>>>> } >>>>>>> gen byte newvar2 = . >>>>>>> forval i = 1/3 { >>>>>>> replace newvar2 = Length[`i'] - Length[5] in `i' >>>>>>> } >>>>>>> forval i = 4/6 { >>>>>>> replace newvar2 = Length[`i'] - Length[2] in `i' >>>>>>> } >>>>>>> gen byte newvar3 = . >>>>>>> forval i = 1/3 { >>>>>>> replace newvar3 = Length[`i'] - Length[6] in `i' >>>>>>> } >>>>>>> forval i = 4/6 { >>>>>>> replace newvar3 = Length[`i'] - Length[3] in `i' >>>>>>> } >>>>>>> list, noobs sep(0) >>>>>>> *** END CODE *** >>>> >>>> >>>> However, my dataset is much more longer and is difficult to perform it. >>>> I hope you can help me giving me more ideas. >>>> I send you an extract of my dataset in .xlsx format >>>> Also, the webpage suggested by Nick to review the discussion about the >>>> topic (http://www.stata-journal.com/sjpdf.html?articlenum=dm0043) redirects >>>> me to a non-sense file to download. Please give me the number of the journal >>>> to read the discussion. >>>> >>>> Happy new year to all of you >>>> >>>> Rodrigo >>>> >>>> >>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote: >>>>> >>>>> Dear Red Owl and Nick >>>>> Thank you very much for your response. The code works perfectly, just >>>>> as I need. >>>>> Best wishes >>>>> Rodrigo >>>>> On 2013-12-14 22:31, Nick Cox wrote: >>>>>> >>>>>> In addition to Red's helpful suggestions, note that technique for such >>>>>> paired data was discussed in >>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0043 >>>>>> which is publicly accessible. The problem is that the identifiers in >>>>>> Rodrigo's example appear to make little sense. How is Stata expected >>>>>> to know that 1 and 4, 2 and 5, 3 and 6 are paired? Perhaps the >>>>>> structure of the dataset is clearer in practice. If so, basic >>>>>> calculations are just a couple of lines or so. >>>>>> Nick >>>>>> [email protected] >>>>>> On 14 December 2013 15:33, Red Owl <[email protected]> wrote: >>>>>>> >>>>>>> Rodrigo, >>>>>>> The following code demonstrates an approach with basic loops. >>>>>>> It could be made more efficient with a different loop >>>>>>> structure, but this approach may be more informative. >>>>>>> *** BEGIN CODE *** >>>>>>> * Build demo data set. >>>>>>> clear >>>>>>> * Length is capitalized to distinguish from length(). >>>>>>> input id str5(side) Length >>>>>>> 1 right 10 >>>>>>> 2 right 15 >>>>>>> 3 right 11 >>>>>>> 4 left 13 >>>>>>> 5 left 10 >>>>>>> 6 left 12 >>>>>>> end >>>>>>> gen byte newvar1 = . >>>>>>> forval i = 1/3 { >>>>>>> replace newvar1 = Length[`i'] - Length[4] in `i' >>>>>>> } >>>>>>> forval i = 4/6 { >>>>>>> replace newvar1 = Length[`i'] - Length[1] in `i' >>>>>>> } >>>>>>> gen byte newvar2 = . >>>>>>> forval i = 1/3 { >>>>>>> replace newvar2 = Length[`i'] - Length[5] in `i' >>>>>>> } >>>>>>> forval i = 4/6 { >>>>>>> replace newvar2 = Length[`i'] - Length[2] in `i' >>>>>>> } >>>>>>> gen byte newvar3 = . >>>>>>> forval i = 1/3 { >>>>>>> replace newvar3 = Length[`i'] - Length[6] in `i' >>>>>>> } >>>>>>> forval i = 4/6 { >>>>>>> replace newvar3 = Length[`i'] - Length[3] in `i' >>>>>>> } >>>>>>> list, noobs sep(0) >>>>>>> *** END CODE *** >>>>>>> Good luck. >>>>>>> Red Owl >>>>>>> [email protected] >>>>>>>> >>>>>>>> Y.R.E. Retamal" <[email protected]> Sat, 14 Dec 2013 12:08:42: >>>>>>>> Dear list >>>>>>>> I am very complicated trying to perform an analysis using STATA and >>>>>>>> I >>>>>>> >>>>>>> cannot find the way. Maybe you could help me. I want to create some >>>>>>> new >>>>>>> variables containing the difference between the length of two >>>>>>> individuals from different groups: >>>>>>>> >>>>>>>> id side length newvar1 newvar2 newvar3 >>>>>>>> 1 right x x-j x-k x-l >>>>>>>> 2 right y y-j y-k y-l >>>>>>>> 3 right z z-j z-k z-l >>>>>>>> 4 left j j-x j-y j-z >>>>>>>> 5 left k k-x k-y k-z >>>>>>>> 6 left l l-x l-y l-z >>>>>>>> I do not know if I do explain myself clearly, the individuals are >>>>>>> >>>>>>> bones (clavicles, for example), so it is possible that some right >>>>>>> clavicles pair-match with left clavicles, following the idea that an >>>>>>> individual has bone of similar length. >>>>>>>> >>>>>>>> Any help could bring me a light! >>>>>>>> Best wishes >>>>>>>> Rodrigo >>>>>>> >>>>>>> * >>>>>>> * For searches and help try: >>>>>>> * http://www.stata.com/help.cgi?search >>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>>> >>>>>> * >>>>>> * For searches and help try: >>>>>> * http://www.stata.com/help.cgi?search >>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>> >>>>> * >>>>> * For searches and help try: >>>>> * http://www.stata.com/help.cgi?search >>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>>> <example.xlsx> >>> >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: pairing unpaired data [was: Re: st: any idea?]***From:*"Y.R.E. Retamal" <[email protected]>

- Prev by Date:
**Re: st: random forest algorithm in Stata?** - Next by Date:
**Re: st: Census/Demographics Datasets** - Previous by thread:
**st: Non-working SROOT options** - Next by thread:
**Re: pairing unpaired data [was: Re: st: any idea?]** - Index(es):