Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Distance from home to hospital [originally no subject]


From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: Distance from home to hospital [originally no subject]
Date   Wed, 03 Jan 2007 23:50:11 -0500

Austin--

Okay, but what are the match variables for this supposed merge? Kelly indicated that there is no identifier linking person to hospital. (And the point of this task is to, effectively, create a link that didn't exist before.) I still think that the structure suggests a -cross-, which is a like a -merge- but joins every person with every hospital; maybe that's what you had in mind anyway. My suggested code was to avoid an absurdly long dataset. But then again, maybe it's not too absurd, if the set of variables is small. You would get 5000000 observations -- not impossible with a small set of variables. It's a big dataset, but the code is simpler:

use personfile
cross using hospitalfile
-- compute dist --
sort person dist
by person: keep if _n<=5 // or whatever small number you want

Again, I've left out the details of "compute dist". Maybe that's where you envision a set of nested loops. Maybe you need loops if the distance computation is complex. But I envisioned some -gen- formulation; in that case, no looping is needed.

By the way, pardon my ignorance, but I haven't yet figured out what YMMV means.
--David

P.S., you might be able to cut down the joining operation if it can be partitioned -- say by state:
use personfile
joinby state using hospitalfile
etc...

It will substantially reduce the resulting joined dataset, but that will eliminate combinations where a person lives near a boundary and the hospital is on the other side of the boundary -- a fairly common situation. But maybe there is some other attribute that can be used, though I can't think of any.

And another matter: Kelly wanted to find the nearest VA hospital and the nearest non-VA hospital. That will take some more work. Perhaps:
use personfile
cross using hospitalfile
-- compute dist --
sort person va dist
by person va: keep if _n==1

That retains two records; one for va, on for non-va. You can then -reshape wide- if you want it in one-record-per-person form.

HTH
--David

At 10:43 PM 1/3/2007, you wrote:

David--
As may be inferred from my post (by someone with superhuman insight),
I think it is much easier to -merge- and then compute within nested
loops, one across all i persons, and one across all j hospitals, as
Nick does in -nearest- but as always, YMMV.

On 1/3/07, David Kantor <[email protected]> wrote:
In response to Kelly Richardson's question about the distance between
home and hospital:

The structure of this situation suggests a -cross- operation on the
two datasets (persons and hospitals) -- at least in theory.
This would yield a _very_ long dataset.  But this is impractical
You might want to loop through one person at a time, joining hospital
data; then select the nearest (or, say the nearest 5 hospitals); then
somehow output just these (maybe using -post-).
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index