Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Calculating Euclidean Distance |

Date |
Fri, 11 Jun 2010 10:29:26 -0400 |

Anthony Laverty <anthonylav@googlemail.com> : Well, you certainly don't want to match on your outcome variable, so I assume you are matching on patient volumes from the pre period, before any policy changes, and maybe you have a dummy t measuring whether a particular policy was instituted, and you have an outcome y which is patient volume at some later date. Then define x1 to x12 for months 1 to 12 of the pre period (or whatever months are in the pre period), and use -nnmatch- (remembering that you can get x1 to x12 from the data structure you outlined via -reshape- to wide form). See also -help xtdpd- and related manual entries, if you want to compare to a regression taking account of the lagged dep var on the RHS. But compare some other approaches: set seed 1234 clear input str1 hospital time patients A 1 456 A 2 759 A 3 236 B 1 214 B 2 854 B 3 325 C 1 250 C 2 321 C 3 852 end * make more fake data expand 100 ren patients x bys time (hospital): g g=_n drop hospital replace x=ceil(uniform()*x) reshape wide x, i(g) j(time) *make a fake treatment corr with observed x g byte t=(uniform()<x2/500) g y=ceil(x1^2+x2^2/2+x3^2/3+t+rnormal()*10) * estimate effect of treatment t with nnmatch or reg nnmatch y t x1-x3, met(maha) bias(bias) robust(4) reg y t reg y t c.x1##c.x1 c.x2##c.x2 c.x3##c.x3 *now parametric propensity score reweighting qui logit t c.x1##c.x1 c.x2##c.x2 c.x3##c.x3 predict p g pw=cond(t,1/p,1/(1-p)) reg y t [pw=pw] reg y t c.x1##c.x1 c.x2##c.x2 c.x3##c.x3 [pw=pw] *now nonparametric propensity score reweighting forv i=1/3 { xtile z`i'=x`i', nq(4) } egen np=mean(t), by(z1 z2 z3) g npw=cond(t,1/np,1/(1-np)) reg y t [pw=npw] reg y t c.x1##c.x1 c.x2##c.x2 c.x3##c.x3 [pw=npw] The last, a double-robust approach with nonparametric propensity score reweighting, has a variety of proven advantages over alternatives. None has sufficient power, but some think they do... you may want to design a simulation based on your data and some hypothesized treatment effects, to see what seems to have the lowest bias or MSE in your design. Or just estimate 10 different ways, and hope you get similar answers! On Fri, Jun 11, 2010 at 4:43 AM, Anthony Laverty <anthonylav@googlemail.com> wrote: > Fair enough, i didnt really give too much more away. After the > matching i am planning on running a difference in difference analysis > to assess for the effect of policy changes on patient numbers, using > the matches as a comparison group. Mahalanobis distance may in fact be > an improvement, so i will look that up > > Many thanks > > On Thu, Jun 10, 2010 at 4:50 PM, Austin Nichols <austinnichols@gmail.com> wrote: >> Anthony Laverty <anthonylav@googlemail.com> : >> You didn't give more detail on your problem--what are you going to use >> the matches for? Why use the sum of squared differences in each >> month, as opposed to, say the Mahalanobis distance over all months >> (-reshape- to have T variables measuring # of patients in each month, >> and find the closest 15 obs in the standard deviation metric)? That >> would match not only on levels but on seasonal patterns, for example. >> Is there a regression you plan to run after matching? You may want to >> -findit nnmatch- in that case. >> >> On Thu, Jun 10, 2010 at 11:30 AM, Anthony Laverty >> <anthonylav@googlemail.com> wrote: >>> Hi Austin >>> >>> That's helpful, thanks, and good points about my memory considerations >>> and perhaps using a log scale >>> >>> Unfortunately, what i really want to be able to do is choose a group >>> of hospitals (say 15) which are closest in Euclidean distance terms to >>> hospital A over all months, rather than just the one closest hospital. >>> I was planning to aggregate these for the whole of the time period at >>> the end, if that makes things any easier. >>> >>> In terms of more detail i'm not sure if it helps to say that this was >>> relatively easy to work out in excel, using a different column for >>> each time period; a row for each hospital and the number of patients >>> for each time period in a table like this. Then, it was quite easy to >>> work out the distances with the equation subtracting different >>> hospitals' numbers from each other, using if statements to match on >>> time. The new data i have is too big for Excel to do this, which is >>> why i have turned to stata (and statalist) >>> >>> Thanks for your consideration >>> >>> Anthony >>> >>> >>> On Thu, Jun 10, 2010 at 2:59 PM, Austin Nichols <austinnichols@gmail.com> wrote: >>>> Anthony Laverty <anthonylav@googlemail.com> : >>>> If you have N hospitals at T points in time, then you will have NTxN >>>> squared distances in your variables, and if they are doubles you may >>>> well run out of memory long before that, but if all you want is the >>>> nearest hospital, then you want one variable per hospital giving the >>>> identity of the nearest (over all months, you seem to suggest). You >>>> might also want to compute distance on a log scale, or some other >>>> metric. With more detail on your problem, you may get a better answer. >>>> Nevertheless, this is like what you asked for, I think: >>>> >>>> clear >>>> input str1 hospital time patients >>>> A 1 456 >>>> A 2 759 >>>> A 3 236 >>>> B 1 214 >>>> B 2 854 >>>> B 3 325 >>>> C 1 250 >>>> C 2 321 >>>> C 3 852 >>>> end >>>> egen g=group(hospital) >>>> su g, mean >>>> loc N=r(max) >>>> forv i=1/`N' { >>>> g double d`i'=. >>>> } >>>> levelsof time, loc(ts) >>>> fillin time g >>>> sort time g >>>> g long obs=_n >>>> qui foreach t of loc ts { >>>> su obs if time==`t', mean >>>> loc n0=r(min) >>>> loc n1=r(max) >>>> forv i=`n0'/`n1' { >>>> loc n=`i'-`n0'+1 >>>> replace d`n'=(patients-patients[`i'])^2 if inrange(_n,`n0',`n1') >>>> } >>>> } >>>> l, sepby(time) noo >>>> >>>> On Thu, Jun 10, 2010 at 5:08 AM, Anthony Laverty >>>> <anthonylav@googlemail.com> wrote: >>>>> Dear Statalist >>>>> >>>>> >>>>> >>>>> I have data on patient numbers at various hospitals and am trying to >>>>> calculate a new variable which is the Euclidean distance between one >>>>> specific hospital (say A) and all of the others, so that i can select >>>>> which hospitals had the most similar number of patients across all >>>>> months. The data is more or less arranged like this (although it has >>>>> a few more columns not of direct interest to this question): >>>>> >>>>> Hospital Time Patients >>>>> A 1 456 >>>>> A 2 759 >>>>> A 3 236 >>>>> B 1 214 >>>>> B 2 854 >>>>> B 3 325 >>>>> C 1 250 >>>>> C 2 321 >>>>> C 3 852 >>>>> >>>>> >>>>> >>>>> So, i want to cycle through each time period and calculate the >>>>> difference squared between hospital A and all of the other hospitals >>>>> individually as one new variable. >>>>> >>>>> >>>>> >>>>> Any suggestions greatly appreciated >>>>> >>>>> >>>>> >>>>> Anthony Laverty * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

**References**:**st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

**Re: st: Calculating Euclidean Distance***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

**Re: st: Calculating Euclidean Distance***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Calculating Euclidean Distance***From:*Anthony Laverty <anthonylav@googlemail.com>

- Prev by Date:
**st: post estimation and standard errors** - Next by Date:
**Re: st: post estimation and standard errors** - Previous by thread:
**Re: st: Calculating Euclidean Distance** - Next by thread:
**Re: st: Calculating Euclidean Distance** - Index(es):