Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Calculating Euclidean Distance |

Date |
Thu, 10 Jun 2010 11:50:44 -0400 |

Anthony Laverty <anthonylav@googlemail.com> : You didn't give more detail on your problem--what are you going to use the matches for? Why use the sum of squared differences in each month, as opposed to, say the Mahalanobis distance over all months (-reshape- to have T variables measuring # of patients in each month, and find the closest 15 obs in the standard deviation metric)? That would match not only on levels but on seasonal patterns, for example. Is there a regression you plan to run after matching? You may want to -findit nnmatch- in that case. On Thu, Jun 10, 2010 at 11:30 AM, Anthony Laverty <anthonylav@googlemail.com> wrote: > Hi Austin > > That's helpful, thanks, and good points about my memory considerations > and perhaps using a log scale > > Unfortunately, what i really want to be able to do is choose a group > of hospitals (say 15) which are closest in Euclidean distance terms to > hospital A over all months, rather than just the one closest hospital. > I was planning to aggregate these for the whole of the time period at > the end, if that makes things any easier. > > In terms of more detail i'm not sure if it helps to say that this was > relatively easy to work out in excel, using a different column for > each time period; a row for each hospital and the number of patients > for each time period in a table like this. Then, it was quite easy to > work out the distances with the equation subtracting different > hospitals' numbers from each other, using if statements to match on > time. The new data i have is too big for Excel to do this, which is > why i have turned to stata (and statalist) > > Thanks for your consideration > > Anthony > > > On Thu, Jun 10, 2010 at 2:59 PM, Austin Nichols <austinnichols@gmail.com> wrote: >> Anthony Laverty <anthonylav@googlemail.com> : >> If you have N hospitals at T points in time, then you will have NTxN >> squared distances in your variables, and if they are doubles you may >> well run out of memory long before that, but if all you want is the >> nearest hospital, then you want one variable per hospital giving the >> identity of the nearest (over all months, you seem to suggest). You >> might also want to compute distance on a log scale, or some other >> metric. With more detail on your problem, you may get a better answer. >> Nevertheless, this is like what you asked for, I think: >> >> clear >> input str1 hospital time patients >> A 1 456 >> A 2 759 >> A 3 236 >> B 1 214 >> B 2 854 >> B 3 325 >> C 1 250 >> C 2 321 >> C 3 852 >> end >> egen g=group(hospital) >> su g, mean >> loc N=r(max) >> forv i=1/`N' { >> g double d`i'=. >> } >> levelsof time, loc(ts) >> fillin time g >> sort time g >> g long obs=_n >> qui foreach t of loc ts { >> su obs if time==`t', mean >> loc n0=r(min) >> loc n1=r(max) >> forv i=`n0'/`n1' { >> loc n=`i'-`n0'+1 >> replace d`n'=(patients-patients[`i'])^2 if inrange(_n,`n0',`n1') >> } >> } >> l, sepby(time) noo >> >> On Thu, Jun 10, 2010 at 5:08 AM, Anthony Laverty >> <anthonylav@googlemail.com> wrote: >>> Dear Statalist >>> >>> >>> >>> I have data on patient numbers at various hospitals and am trying to >>> calculate a new variable which is the Euclidean distance between one >>> specific hospital (say A) and all of the others, so that i can select >>> which hospitals had the most similar number of patients across all >>> months. The data is more or less arranged like this (although it has >>> a few more columns not of direct interest to this question): >>> >>> Hospital Time Patients >>> A 1 456 >>> A 2 759 >>> A 3 236 >>> B 1 214 >>> B 2 854 >>> B 3 325 >>> C 1 250 >>> C 2 321 >>> C 3 852 >>> >>> >>> >>> So, i want to cycle through each time period and calculate the >>> difference squared between hospital A and all of the other hospitals >>> individually as one new variable. >>> >>> >>> >>> Any suggestions greatly appreciated >>> >>> >>> >>> Anthony Laverty * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

**References**:**st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

**Re: st: Calculating Euclidean Distance***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

- Prev by Date:
**Re: st: Right skewed (positive) dependent variable** - Next by Date:
**re: st: AW: Labeling variable values in Regression Tables** - Previous by thread:
**Re: st: Calculating Euclidean Distance** - Next by thread:
**Re: st: Calculating Euclidean Distance** - Index(es):