Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Anthony Laverty <anthonylav@googlemail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Calculating Euclidean Distance |

Date |
Fri, 11 Jun 2010 09:43:29 +0100 |

Fair enough, i didnt really give too much more away. After the matching i am planning on running a difference in difference analysis to assess for the effect of policy changes on patient numbers, using the matches as a comparison group. Mahalanobis distance may in fact be an improvement, so i will look that up Many thanks On Thu, Jun 10, 2010 at 4:50 PM, Austin Nichols <austinnichols@gmail.com> wrote: > Anthony Laverty <anthonylav@googlemail.com> : > You didn't give more detail on your problem--what are you going to use > the matches for? Why use the sum of squared differences in each > month, as opposed to, say the Mahalanobis distance over all months > (-reshape- to have T variables measuring # of patients in each month, > and find the closest 15 obs in the standard deviation metric)? That > would match not only on levels but on seasonal patterns, for example. > Is there a regression you plan to run after matching? You may want to > -findit nnmatch- in that case. > > On Thu, Jun 10, 2010 at 11:30 AM, Anthony Laverty > <anthonylav@googlemail.com> wrote: >> Hi Austin >> >> That's helpful, thanks, and good points about my memory considerations >> and perhaps using a log scale >> >> Unfortunately, what i really want to be able to do is choose a group >> of hospitals (say 15) which are closest in Euclidean distance terms to >> hospital A over all months, rather than just the one closest hospital. >> I was planning to aggregate these for the whole of the time period at >> the end, if that makes things any easier. >> >> In terms of more detail i'm not sure if it helps to say that this was >> relatively easy to work out in excel, using a different column for >> each time period; a row for each hospital and the number of patients >> for each time period in a table like this. Then, it was quite easy to >> work out the distances with the equation subtracting different >> hospitals' numbers from each other, using if statements to match on >> time. The new data i have is too big for Excel to do this, which is >> why i have turned to stata (and statalist) >> >> Thanks for your consideration >> >> Anthony >> >> >> On Thu, Jun 10, 2010 at 2:59 PM, Austin Nichols <austinnichols@gmail.com> wrote: >>> Anthony Laverty <anthonylav@googlemail.com> : >>> If you have N hospitals at T points in time, then you will have NTxN >>> squared distances in your variables, and if they are doubles you may >>> well run out of memory long before that, but if all you want is the >>> nearest hospital, then you want one variable per hospital giving the >>> identity of the nearest (over all months, you seem to suggest). You >>> might also want to compute distance on a log scale, or some other >>> metric. With more detail on your problem, you may get a better answer. >>> Nevertheless, this is like what you asked for, I think: >>> >>> clear >>> input str1 hospital time patients >>> A 1 456 >>> A 2 759 >>> A 3 236 >>> B 1 214 >>> B 2 854 >>> B 3 325 >>> C 1 250 >>> C 2 321 >>> C 3 852 >>> end >>> egen g=group(hospital) >>> su g, mean >>> loc N=r(max) >>> forv i=1/`N' { >>> g double d`i'=. >>> } >>> levelsof time, loc(ts) >>> fillin time g >>> sort time g >>> g long obs=_n >>> qui foreach t of loc ts { >>> su obs if time==`t', mean >>> loc n0=r(min) >>> loc n1=r(max) >>> forv i=`n0'/`n1' { >>> loc n=`i'-`n0'+1 >>> replace d`n'=(patients-patients[`i'])^2 if inrange(_n,`n0',`n1') >>> } >>> } >>> l, sepby(time) noo >>> >>> On Thu, Jun 10, 2010 at 5:08 AM, Anthony Laverty >>> <anthonylav@googlemail.com> wrote: >>>> Dear Statalist >>>> >>>> >>>> >>>> I have data on patient numbers at various hospitals and am trying to >>>> calculate a new variable which is the Euclidean distance between one >>>> specific hospital (say A) and all of the others, so that i can select >>>> which hospitals had the most similar number of patients across all >>>> months. The data is more or less arranged like this (although it has >>>> a few more columns not of direct interest to this question): >>>> >>>> Hospital Time Patients >>>> A 1 456 >>>> A 2 759 >>>> A 3 236 >>>> B 1 214 >>>> B 2 854 >>>> B 3 325 >>>> C 1 250 >>>> C 2 321 >>>> C 3 852 >>>> >>>> >>>> >>>> So, i want to cycle through each time period and calculate the >>>> difference squared between hospital A and all of the other hospitals >>>> individually as one new variable. >>>> >>>> >>>> >>>> Any suggestions greatly appreciated >>>> >>>> >>>> >>>> Anthony Laverty > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Calculating Euclidean Distance***From:*Austin Nichols <austinnichols@gmail.com>

**References**:**st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

**Re: st: Calculating Euclidean Distance***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

**Re: st: Calculating Euclidean Distance***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**st: Sameer Rajbhandary is out of the office.** - Next by Date:
**st: "Label" field in the Variables window missing** - Previous by thread:
**Re: st: Calculating Euclidean Distance** - Next by thread:
**Re: st: Calculating Euclidean Distance** - Index(es):