# Re: st: RE: how to choose between geographical identifiers??

 From "Austin Nichols" To statalist@hsphsun2.harvard.edu Subject Re: st: RE: how to choose between geographical identifiers?? Date Wed, 8 Mar 2006 20:56:57 -0500

```I'm not a geographer, but I think this is an interesting question.
You could just regress wage on a full set of dummies twice, once for
LAD and once for TTWA, and compare the R-squared values, though that
is unlikely to convince you or anyone else that one division is more
useful than another.  I guess I would start by calculating mean and
standard deviation of log wage for each LAD and TTWA, and population
for each, and then I would make two graphs of the StdDevs against the
means with marker size given by population, just to get a sense of
what kind of variation in wages the divisions capture.  A picture can
give you a better sense of the data than numerous tabular results,
sometimes.

I think your criterion is really a kind of entropy-minimizing one,
since you don't want to have geocode categories to 8 decimal places
(one category for each worker produces very little variation within
cells, and a lot of categories) or a country identifier (one cell with
a lot of variation within cell).  So the size of the grid, in terms of
population in each LAD/TTWA, is important, not just how homogenous

I'll be interested in what others with more experience in this area
have to say on how they would approach this problem.  Nick--how would
you measure minimal structure in residuals here?

On 3/8/06, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I am a geographer but I don't know much about (what is
> usually called human) geography. I regarded it as my main field
> of interest between 1968 and 1969, but no longer. There aren't
> many geographers on this list, I think.
>
> However, your question is not really geographical. I guess
> from this that you are using lots of dummies in each case
> and for once the answer is whichever set of dummies gives
> you a better model, according to your criteria of model
> excellence (my favourite criterion is usually minimal
> structure in residuals).
>
> as both spring from a idea of an area functioning together
> rather than formal similarity of anything. So knowing the
> area might not help enormously in predicting wage. But
> whichever spatial subdivision has a finer mesh should
> prove better.
>
> Nick
> n.j.cox@durham.ac.uk
>
>
> > I have a bunch of wage observations and all the observations are
> > attached with two geographical identifiers - local authority districts
> > (LADs) and travel to work areas (TTWAs).  I want to find out how wages
> > vary across different areas in UK.
> >
> > Now I can run wage estimations using either one of the two categorical
> > variables as explanatory variable.  I would however like to find out
> > which categorical variable fits the data better.  How do I compare the
> > two sets of results given that the explanatory variables are quite
> > different?
> >
> > Could you recommend what kind of tests I should use and if you are a
> > geographer, could you tell me are there any criteria that are used by
> > geographers to choose between different definitions of geographies
> > (regions, as opposed to LADs, as opposed to TTWAs, etc.)
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```