Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Calculating Euclidean Distance

From	Anthony Laverty <[email protected]>
To	[email protected]
Subject	Re: st: Calculating Euclidean Distance
Date	Thu, 10 Jun 2010 16:30:58 +0100

Hi Austin

That's helpful, thanks, and good points about my memory considerations
and perhaps using a log scale

Unfortunately, what i really want to be able to do is choose a group
of hospitals (say 15) which are closest in Euclidean distance terms to
hospital A over all months, rather than just the one closest hospital.
I was planning to aggregate these for the whole of the time period at
the end, if that makes things any easier.

In terms of more detail i'm not sure if it helps to say that this was
relatively easy to work out in excel, using a different column for
each time period; a row for each hospital and the number of patients
for each time period in a table like this. Then, it was quite easy to
work out the distances with the equation subtracting different
hospitals' numbers from each other, using if statements to match on
time. The new data i have is too big for Excel to do this, which is
why i have turned to stata (and statalist)

Thanks for your consideration

Anthony


On Thu, Jun 10, 2010 at 2:59 PM, Austin Nichols <[email protected]> wrote:
> Anthony Laverty <[email protected]> :
> If you have N hospitals at T points in time, then you will have NTxN
> squared distances in your variables, and if they are doubles you may
> well run out of memory long before that, but if all you want is the
> nearest hospital, then you want one variable per hospital giving the
> identity of the nearest (over all months, you seem to suggest). You
> might also want to compute distance on a log scale, or some other
> metric. With more detail on your problem, you may get a better answer.
> Nevertheless, this is like what you asked for, I think:
>
> clear
> input str1 hospital time patients
>  A 1 456
>  A 2 759
>  A 3 236
>  B 1 214
>  B 2 854
>  B 3 325
>  C 1 250
>  C 2 321
>  C 3 852
> end
> egen g=group(hospital)
> su g, mean
> loc N=r(max)
> forv i=1/`N' {
>  g double d`i'=.
> }
> levelsof time, loc(ts)
> fillin time g
> sort time g
> g long obs=_n
> qui foreach t of loc ts {
>  su obs if time==`t', mean
>  loc n0=r(min)
>  loc n1=r(max)
>  forv i=`n0'/`n1' {
> loc n=`i'-`n0'+1
> replace d`n'=(patients-patients[`i'])^2 if inrange(_n,`n0',`n1')
>  }
> }
> l, sepby(time) noo
>
> On Thu, Jun 10, 2010 at 5:08 AM, Anthony Laverty
> <[email protected]> wrote:
>> Dear Statalist
>>
>>
>>
>> I have data on patient numbers at various hospitals and am trying to
>> calculate a new variable which is the Euclidean distance between one
>> specific hospital (say A) and all of the others, so that i can select
>> which hospitals had the most similar number of patients across all
>> months.  The data is more or less arranged like this (although it has
>> a few more columns not of direct interest to this question):
>>
>> Hospital     Time           Patients
>> A                 1                 456
>> A                 2                 759
>> A                 3                  236
>> B                 1                 214
>> B                 2                 854
>> B                 3                 325
>> C                 1                 250
>> C                  2                321
>> C                  3                852
>>
>>
>>
>> So, i want to cycle through each time period and calculate the
>> difference squared between hospital A and all of the other hospitals
>> individually as one new variable.
>>
>>
>>
>> Any suggestions greatly appreciated
>>
>>
>>
>> Anthony Laverty
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Calculating Euclidean Distance
  - From: Austin Nichols <[email protected]>

References:
- st: Calculating Euclidean Distance
  - From: Anthony Laverty <[email protected]>
- Re: st: Calculating Euclidean Distance
  - From: Austin Nichols <[email protected]>

Prev by Date: Re: st: Right skewed (positive) dependent variable
Next by Date: st: Ordered and multinomial probit sample selection models
Previous by thread: Re: st: Calculating Euclidean Distance
Next by thread: Re: st: Calculating Euclidean Distance
Index(es):
- Date
- Thread