Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st:what if no data is available at all

From   Maarten buis <>
Subject   Re: st:what if no data is available at all
Date   Thu, 20 Jul 2006 07:17:45 +0100 (BST)

--- wrote:
> The problem here is that I have no data for x1(firm level),thus I
> instead, tried to regress x1 on related variables using "industry
> level" data and use the fitted values to generate x1.
> More specifically,
>   1st step: run X1=b0+b1*Z1+b2*Z2+... and find out fitted values of
> all b's (X1 and all Z's are industry level data, but not annually )
>  2nd step: using fitted value b's and firm level data,
> find out x1 such that x1=b0hat + b1hat*z1+b2hat*z2
> (all z's are firm level annual data,  )

The problem with this approach is called the ecological fallacy. The
classical (at least in sociology) paper on this is (Robinson 1950). The
nicest example he gives is: In the US in the 1930s states with high
proportion immigrants also had a high literacy rate (in the English
language), while immigrants were on average less literate than
non-immigrants. Regressing state level literacy rate on state level
proportion of immigrants would thus give a completely wrong picture
about the relationship between individual immigrant status and
literacy. At first glance this may look like a case of omitted variable
bias: immigrants go to places where work is, there is work in places
that are economically well off, and places that are economically well
of have a high literacy rate. But average literacy in a state is really
a different variable with a different meaning than individual literacy.
The average literacy forms one aspect of the context within which
individuals act. For instance an illiterate person might want to go to
a place where he faces little competition from "native" illiterates,
which would compete for the same kind of work as he, but have the
advantage of knowing local customs. Here both the individual level
literacy and the state mean literacy should be part of your model. This
is quite common in multilevel modeling, a nice example of this can be
found at
Among others it deals with the paradox that "For decades, the Democrats
have been viewed as the party of the poor, with the Republicans
representing the rich. Recent presidential elections, however, have
shown a reverse pattern, with Democrats performing well in the richer
“blue” states in the northeast and west coast, and Republicans
dominating in the “red” states in the middle of the country."

Some weeks ago Scott announced that there is an ecological inference
program for Stata available. You might be able to use that to create


Robinson W. S. (1950), "Ecological correlations and the behavior of
individuals". American Sociological Review 15: 351–57

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214

+31 20 5986715

All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index