Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Calculating Euclidean Distance


From   Anthony Laverty <anthonylav@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Calculating Euclidean Distance
Date   Fri, 11 Jun 2010 09:43:29 +0100

Fair enough, i didnt really give too much more away. After the
matching i am planning on running a difference in difference analysis
to assess for the effect of policy changes on patient numbers, using
the matches as a comparison group. Mahalanobis distance may in fact be
an improvement, so i will look that up

Many thanks

On Thu, Jun 10, 2010 at 4:50 PM, Austin Nichols <austinnichols@gmail.com> wrote:
> Anthony Laverty <anthonylav@googlemail.com> :
> You didn't give more detail on your problem--what are you going to use
> the matches for?  Why use the sum of squared differences in each
> month, as opposed to, say the Mahalanobis distance over all months
> (-reshape- to have T variables measuring # of patients in each month,
> and find the closest 15 obs in the standard deviation metric)?  That
> would match not only on levels but on seasonal patterns, for example.
> Is there a regression you plan to run after matching?  You may want to
> -findit nnmatch- in that case.
>
> On Thu, Jun 10, 2010 at 11:30 AM, Anthony Laverty
> <anthonylav@googlemail.com> wrote:
>> Hi Austin
>>
>> That's helpful, thanks, and good points about my memory considerations
>> and perhaps using a log scale
>>
>> Unfortunately, what i really want to be able to do is choose a group
>> of hospitals (say 15) which are closest in Euclidean distance terms to
>> hospital A over all months, rather than just the one closest hospital.
>> I was planning to aggregate these for the whole of the time period at
>> the end, if that makes things any easier.
>>
>> In terms of more detail i'm not sure if it helps to say that this was
>> relatively easy to work out in excel, using a different column for
>> each time period; a row for each hospital and the number of patients
>> for each time period in a table like this. Then, it was quite easy to
>> work out the distances with the equation subtracting different
>> hospitals' numbers from each other, using if statements to match on
>> time. The new data i have is too big for Excel to do this, which is
>> why i have turned to stata (and statalist)
>>
>> Thanks for your consideration
>>
>> Anthony
>>
>>
>> On Thu, Jun 10, 2010 at 2:59 PM, Austin Nichols <austinnichols@gmail.com> wrote:
>>> Anthony Laverty <anthonylav@googlemail.com> :
>>> If you have N hospitals at T points in time, then you will have NTxN
>>> squared distances in your variables, and if they are doubles you may
>>> well run out of memory long before that, but if all you want is the
>>> nearest hospital, then you want one variable per hospital giving the
>>> identity of the nearest (over all months, you seem to suggest). You
>>> might also want to compute distance on a log scale, or some other
>>> metric. With more detail on your problem, you may get a better answer.
>>> Nevertheless, this is like what you asked for, I think:
>>>
>>> clear
>>> input str1 hospital time patients
>>>  A 1 456
>>>  A 2 759
>>>  A 3 236
>>>  B 1 214
>>>  B 2 854
>>>  B 3 325
>>>  C 1 250
>>>  C 2 321
>>>  C 3 852
>>> end
>>> egen g=group(hospital)
>>> su g, mean
>>> loc N=r(max)
>>> forv i=1/`N' {
>>>  g double d`i'=.
>>> }
>>> levelsof time, loc(ts)
>>> fillin time g
>>> sort time g
>>> g long obs=_n
>>> qui foreach t of loc ts {
>>>  su obs if time==`t', mean
>>>  loc n0=r(min)
>>>  loc n1=r(max)
>>>  forv i=`n0'/`n1' {
>>> loc n=`i'-`n0'+1
>>> replace d`n'=(patients-patients[`i'])^2 if inrange(_n,`n0',`n1')
>>>  }
>>> }
>>> l, sepby(time) noo
>>>
>>> On Thu, Jun 10, 2010 at 5:08 AM, Anthony Laverty
>>> <anthonylav@googlemail.com> wrote:
>>>> Dear Statalist
>>>>
>>>>
>>>>
>>>> I have data on patient numbers at various hospitals and am trying to
>>>> calculate a new variable which is the Euclidean distance between one
>>>> specific hospital (say A) and all of the others, so that i can select
>>>> which hospitals had the most similar number of patients across all
>>>> months.  The data is more or less arranged like this (although it has
>>>> a few more columns not of direct interest to this question):
>>>>
>>>> Hospital     Time           Patients
>>>> A                 1                 456
>>>> A                 2                 759
>>>> A                 3                  236
>>>> B                 1                 214
>>>> B                 2                 854
>>>> B                 3                 325
>>>> C                 1                 250
>>>> C                  2                321
>>>> C                  3                852
>>>>
>>>>
>>>>
>>>> So, i want to cycle through each time period and calculate the
>>>> difference squared between hospital A and all of the other hospitals
>>>> individually as one new variable.
>>>>
>>>>
>>>>
>>>> Any suggestions greatly appreciated
>>>>
>>>>
>>>>
>>>> Anthony Laverty
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index