Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pairing unpaired data [was: Re: st: any idea?]

From	"Y.R.E. Retamal" <[email protected]>
To	[email protected]
Subject	Re: pairing unpaired data [was: Re: st: any idea?]
Date	Sat, 11 Jan 2014 12:52:16 +0000

Dear all

Many thanks to Nick(s), Sarah and Fernando for your response and sorryfor the delay in my response, I have been off the last week.I have checked your suggestions and it seems Sarah advice has been themost useful until now. I cannot find the teffects command in STATA 11,so I could not check it.

The method of osteometric sorting that I want to perform is described byJohn Byrd in the Chapter 10 "Models and Methods for Osteometric Sorting"from the book "Recovery, Analysis, and Identification of CommingledHuman Remains", Bradley J. Adams and John E. Byrd, eds. Humana Press,2008. It can be found online relatively easy.

In this chapter, Byrd explain the methods of osteometric sorting:

"The basic principle underlying osteometric sorting is that two bonesthat are of sizes more disparate than observed in most humans are likelyto be commingled".Models for comparison of right and left paired bones were developed thatemphasize shape, taking the general form D = Σ(ai − bi), where a is theright side bone measurement i, and b is the left side bone measurementi for each of the measurements included in the comparison. The nullhypothesis of no difference is tested by comparing the value of Dagainst “0” (no difference) and using the reference data standarddeviation of D. The deviation from “0” divided by the reference datastandard deviation is evaluated against the t-distribution with twotails to obtain a p-value. A low p-value provides a measure of thestrength of evidence against the null, which can also be taken asevidence for how atypical the case specimens are assuming they originatein the same individual.


Best wishes

Rodrigo


On 2014-01-08 16:39, Nick Winter wrote:

Could nearest-neighbor matching from the land of treatment effects
estimation be repurposed here?

Using the data input below, something like this:

encode side, gen(nside)
gen junkoutcome = uniform()

teffects nnmatch (junkoutcome length) (nside), generate(match)

list id type side length match*, clean


       id    type    side   length   match1
  1.    1   femur    left       18       11
  2.    2   femur    left    65.85       12
  3.    3   femur    left     69.1       12
  4.    4   femur    left      130       16
  5.    5   femur    left    131.2       16
  6.    6   femur    left      143       18
  7.    7   femur    left      145       18
  8.    8   femur    left      160       19
  9.    9   femur    left      183       20
 10.   10   femur    left      200       20
 11.   11   femur   right       28        1
 12.   12   femur   right       80        3
 13.   13   femur   right     96.5        3
 14.   14   femur   right      126        4
 15.   15   femur   right      127        4
 16.   16   femur   right      128        4
 17.   17   femur   right      138        6
 18.   18   femur   right      146        7
 19.   19   femur   right      148        7
 20.   20   femur   right      200       10


Nick Winter


On 1/7/2014 3:36 PM, Sarah Edgington wrote:

Rodrigo,

This is a complicated problem because it requires doing a calculationforeach possible pair of left/right bones. Depending on how many bonesyou

have, this could turn out to be quite cumbersome.

The near matching method Fernando suggests could work, but the factthatyou'll ultimately need to match on more than one dimension seems likeit

might create problems.

How many bones of each type do you actually have? If it's arelativelysmall number (for example a few hundred each of left and right foreach typeof bone) you may be able to just use a brute force method by creatingadataset with each possible combination of left and right bones.You'd want

to do this separately by bone type.

For example, you might create a dataset of left femur measurements andadataset of right femur measurements. You could then use joinby tocreate

all the possible combinations between the two.

This might look something like the code below (note that I've onlyinput thefemur data here, but this code assumes you have other types as well).Keepin mind that this creates a dataset that has NrightXNleftobservations. For

large datasets this likely won't be possible.


clear
input id str10 type str5 side length
  1 femur left 18
  2 femur left 65.85
  3 femur left 69.1
  4 femur left 130
  5 femur left 131.2
  6 femur left 143
  7 femur left 145
  8 femur left 160
  9 femur left 183
  10 femur left 200
  11 femur right 28
  12 femur right 80
  13 femur right 96.5
  14 femur right 126
  15 femur right 127
  16 femur right 128
  17 femur right 138
  18 femur right 146
  19 femur right 148
  20 femur right 200
  end


  keep if type=="femur"
  preserve
  keep if side=="left"
  rename length left_length
  rename id left_id
  drop side
  tempfile leftfemur
  save `leftfemur'

  restore
  keep if side=="right"
  rename length right_length
  rename id right_id
  drop side

  joinby type using `leftfemur'

  **you now have every possible pair of measurements
  gen lengthdiff=abs(right_length-left_length)

At this point you'll need very exact rules about what constitutes amatch.Once you've done that, that is still not the end of the task. Fromthereyou'll have to see how often you have bones that match multiple otherbones.

Again, to do this you'll need to specify the exact rules about what is

"close enough" to consider it a possible match. Then you'll need tocome up

with rules for disambiguation.

This is not an elegant solution and if you have a lot of data it maynotwork. However, if you have few enough cases for this to work it hastheadvantage of making it pretty easy to specify matching rules formultiple

measurements.

-Sarah -----Original Message-----
From: [email protected]

[mailto:[email protected]] On Behalf Of FernandoRios

Avila
Sent: Tuesday, January 07, 2014 11:38 AM
To: [email protected]
Subject: Re: pairing unpaired data [was: Re: st: any idea?]

Rodrigo,

Perhaps a direction you could follow is by using a near matchingmethod.Since you can separate the information in two datasets (namely leftandright), you can do so, and then "merge" them using the user writtenprogram

-nearmrg-.

That will give you a start point to match up your data, but you mightneedto make further revisions to ensure that there are no duplicatematching.

Best

On Tue, Jan 7, 2014 at 2:27 PM, Nick Cox <[email protected]> wrote:

Thanks for the details of your problem. I can't see that you have a
method that is translatable into Stata code: your procedure is too

vaguely specified. That need not stop other people suggestingmethods.

Nick
[email protected]


On 7 January 2014 19:20, Y.R.E. Retamal <[email protected]> wrote:

Dear Nick

Thanks a lot for your soon response. The method is no more than
showed. I have to add other variables like width and height for the
same bone. So, if three variables match, probably both bones wouldbe

from the same skeleton.

I would expect that many bones would not match between them, so I
could discard them being from the same skeleton. Problems would
appear if e.g. a right bone matches with more than one left bone.Butat least I could simplify the work and after I could focus onproblematic

cases.


Rodrigo







On 2014-01-07 18:49, Nick Cox wrote:


I changed the thread title, which was not informative.

You need a method. Some predictable pitfalls are that for somebones

there is no acceptable match and that others there could be two or
more acceptable matches. I don't think there is a canned solution
independent of your spelling out what the method is.

Nick
[email protected]


On 7 January 2014 18:20, Y.R.E. Retamal <[email protected]> wrote:


Thank you very much Eric and Nick for the advices.

I will try to give a clearer idea of what want to do:

For example I have the following database of human bones. Iremoved

missing values of length for a better understanding:

id type side length id type sidelength

1       femur   left    18              21      humerus left    13
2       femur   left    65.85           22      humerus left    56
3       femur   left    69.1            23      humerus left    92

4 femur left 130 24 humerus left1265 femur left 131.2 25 humerus left1546 femur left 143 26 humerus left1707 femur left 145 27 humerus left1988 femur left 160 28 humerus left2289 femur left 183 29 humerus left23010 femur left 200 30 humerus left23211 femur right 28 31 humerus right238

12      femur   right   80              32      humerus right   10
13      femur   right   96.5            33      humerus right   66

14 femur right 126 34 humerus right12315 femur right 127 35 humerus right12816 femur right 128 36 humerus right14317 femur right 138 37 humerus right20018 femur right 146 38 humerus right22819 femur right 148 39 humerus right23020 femur right 200 40 humerus right241


These data belong to a commingled skeletal collection and some
right bones (femurs and humerus respectively) should match with a
left bone, but I do not know which bones match. Following the idea
that a right bone from a same skeleton should have the same length
(approximately) with its respective left bone, I want to subtract
each right femur to each left femur, with the aim to find which

right femur matches with a left femur, i.e. have the same oralmost

the same length, so the subtraction would be zero or near zero.
The same proceeding with the humerus (and other bones).

If you have any idea to perform this, please let me know.

Rodrigo



Best wishes

Rodrigo





On 2014-01-05 23:54, Nick Cox wrote:



<>

Eric Booth gives very good advice.

Your problem with the link to the Stata Journal file you were
directed to me may be just that you didn't step past the standard
material bundled with every reprint file.

Nick
[email protected]

On 5 January 2014 21:03, Eric Booth <[email protected]>wrote:

<>

The Stata Journal link you mention that Nick sent you works forme.

The
title of the article is "Stata tip 71: The problem of split

identity, or how to group dyads" by Nick J. Cox, so maybe youcan

google that title if your browser isn't navigating to it
properly.



Your example dataset doesn't align with your desired dataset.

How do we know what is x and what is j in the first 20 obs of
your example data (see below) (also note the Statalist FAQ about
not sending
attachments) ?

You need some kind of identifier that ties, for example, obs or
id 1 (even though it's missing) to the other right side femur
observation of interest (is it id 7 or id 9 or ??).


**your example data:

id      type    side    length
1       femur   right
2       femur   left
3       femur   right
4       femur   left
5       femur   right   373
6       femur   left    416
7       femur   right   138
8       femur   left
9       femur   right   270
10      femur   left
11      femur   left
12      femur   right
13      femur   left
14      femur   right
15      femur   left    281
16      femur   right
17      femur   left    160
18      femur   left
19      femur   right
20      femur   left


We can't just sort by 'type' and 'side' to get a dataset of the

same structure as you presented initially, so I think you needto

provide more information about this.  (also, if the rule is, as

you imply, to sort by type and side and then subtract everythird

observation from each other then what do we do with missing
'length' and missing 'side'?)

If the rule is that id 1 and id 2 are a pair then whey does the
left/right ordering suddenly change starting around id 17?

- Eric

On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <[email protected]>wrote:

Dear Guys

Some weeks ago, Red Owl and Nick helped me with some loops for
my work.
I have tried to run some suggestion in my dataset, but I had
some difficulties.
I give you the basic structure of my dataset and my question:

I want to create some new variables containing the difference
between the length of two individuals from different groups:

id     side     length      newvar1       newvar2      newvar3
1      right      x           x-j           x-k          x-l
2      right      y           y-j           y-k          y-l
3      right      z           z-j           z-k          z-l
4      left       j           j-x           j-y          j-z
5      left       k           k-x           k-y          k-z
6      left       l           l-x           l-y          l-z

Red Owl suggested me following this example:

*** BEGIN CODE ***
* Build demo data set.
clear
* Length is capitalized to distinguish from length().
input id str5(side) Length
1 right 10
2 right 15
3 right 11
4 left  13
5 left  10
6 left  12
end
gen byte newvar1 = .
forval i = 1/3 {
  replace newvar1 = Length[`i'] - Length[4] in `i'
  }
forval i = 4/6 {
  replace newvar1 = Length[`i'] - Length[1] in `i'
  }
gen byte newvar2 = .
forval i = 1/3 {
  replace newvar2 = Length[`i'] - Length[5] in `i'
  }
forval i = 4/6 {
  replace newvar2 = Length[`i'] - Length[2] in `i'
  }
gen byte newvar3 = .
forval i = 1/3 {
  replace newvar3 = Length[`i'] - Length[6] in `i'
  }
forval i = 4/6 {
  replace newvar3 = Length[`i'] - Length[3] in `i'
  }
list, noobs sep(0)
*** END CODE ***




However, my dataset is much more longer and is difficult to
perform it.
I hope you can help me giving me more ideas.
I send you an extract of my dataset in .xlsx format Also, the
webpage suggested by Nick to review the discussion about the
topic
(http://www.stata-journal.com/sjpdf.html?articlenum=dm0043)
redirects

me to a non-sense file to download. Please give me the numberof

the journal to read the discussion.

Happy new year to all of you

Rodrigo


On 2013-12-15 22:39, Y.R.E. Retamal wrote:



Dear Red Owl and Nick
Thank you very much for your response. The code works
perfectly, just as I need.
Best wishes
Rodrigo
On 2013-12-14 22:31, Nick Cox wrote:



In addition to Red's helpful suggestions, note that technique
for such paired data was discussed in
http://www.stata-journal.com/sjpdf.html?articlenum=dm0043
which is publicly accessible. The problem is that the
identifiers in Rodrigo's example appear to make little sense.
How is Stata expected to know that 1 and 4, 2 and 5, 3 and 6

are paired? Perhaps the structure of the dataset is clearerinpractice. If so, basic calculations are just a couple oflines or

so.

Nick
[email protected]
On 14 December 2013 15:33, Red Owl <[email protected]> wrote:



Rodrigo,

The following code demonstrates an approach with basicloops.

It could be made more efficient with a different loop
structure, but this approach may be more informative.
*** BEGIN CODE ***
* Build demo data set.
clear
* Length is capitalized to distinguish from length().
input id str5(side) Length
1 right 10
2 right 15
3 right 11
4 left  13
5 left  10
6 left  12
end
gen byte newvar1 = .
forval i = 1/3 {
  replace newvar1 = Length[`i'] - Length[4] in `i'
  }
forval i = 4/6 {
  replace newvar1 = Length[`i'] - Length[1] in `i'
  }
gen byte newvar2 = .
forval i = 1/3 {
  replace newvar2 = Length[`i'] - Length[5] in `i'
  }
forval i = 4/6 {
  replace newvar2 = Length[`i'] - Length[2] in `i'
  }
gen byte newvar3 = .
forval i = 1/3 {
  replace newvar3 = Length[`i'] - Length[6] in `i'
  }
forval i = 4/6 {
  replace newvar3 = Length[`i'] - Length[3] in `i'
  }
list, noobs sep(0)
*** END CODE ***
Good luck.
Red Owl
[email protected]

Y.R.E. Retamal" <[email protected]> Sat, 14 Dec 201312:08:42:
Dear list
I am very complicated trying to perform an analysis using
STATA and I

cannot find the way. Maybe you could help me. I want tocreate

some

new
variables containing the difference between the length oftwo
individuals from different groups:
id side length newvar1 newvar2newvar31 right x x-j x-kx-l2 right y y-j y-ky-l3 right z z-j z-kz-l4 left j j-x j-yj-z5 left k k-x k-yk-z6 left l l-x l-yl-zI do not know if I do explain myself clearly, theindividuals

are

bones (clavicles, for example), so it is possible that somerightclavicles pair-match with left clavicles, following the ideathat
an
individual has bone of similar length.
Any help could bring me a light!
Best wishes
Rodrigo
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



<example.xlsx>





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- pairing unpaired data [was: Re: st: any idea?]
  - From: Nick Cox <[email protected]>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: "Y.R.E. Retamal" <[email protected]>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: Nick Cox <[email protected]>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: Fernando Rios Avila <[email protected]>
- RE: pairing unpaired data [was: Re: st: any idea?]
  - From: "Sarah Edgington" <[email protected]>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: Nick Winter <[email protected]>

Prev by Date: Re: st: Problem in installing ssc install xtserial
Next by Date: Re: st: Problem in installing ssc install xtserial
Previous by thread: Re: pairing unpaired data [was: Re: st: any idea?]
Next by thread: RE: pairing unpaired data [was: Re: st: any idea?]
Index(es):
- Date
- Thread