[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: iv in heckman model

From   Peteke Feijten <>
Subject   st: iv in heckman model
Date   Wed, 18 Jun 2008 18:03:52 +0100

Dear Statalist users,

I want to study a continuous occupational status score (measured on the CAMSIS scale). There is a potential selection problem, because we do not observe occuptional scores for people who don’t work. Studying occupational status score without taking selection into account would probably lead to upwardly biased estimations.
We are therefore trying to fit a heckman two-step model, with CAMSIS score as the outcome of the main equation and ‘working’ as the binary outcome of the selection equation. Our instrumental variable is the number of children in the household.

As it turns out, example 1 in the Stata manual under the ‘heckman’ entry is very similar to our case. In that example, the number of children is also used as the iv, but the main outcome of interest is wage whereas ours is occupational status score.
However, if my understanding of the heckman method is right, one condition is that the instrumental variable must have a significant impact on the selection outcome, but NOT on the main outcome. This condition is not addressed nor tested in the example in the STATA manual. Thus I downloaded the example dataset ‘womenwk’ and first ran the Heckman model used in the manual and got identical results (pg 556, reference manual A-H release 10).

However, a simple regression of wage on the exogenous variables and the iv ‘children’, to see if the iv affects wage, shows that in fact it does (see below). So, while we would agree that the number of children is a good theoretical choice, it does not seem to meet the empirical requirements of the Heckman approach. Or am I wrong? Or am I testing this in the wrong way?

The obvious reason for asking is that I have found the same problem in our analysis as the number of children is significant in the selection model, but is also significantly related to the occupational score in a standard regression model.

Any help would be appreciated.

reg wage education age children married

Source | SS df MS Number of obs = 1343
-------------+------------------------------ F( 4, 1338) = 128.55
Model | 14812.5356 4 3703.1339 Prob > F = 0.0000
Residual | 38542.3591 1338 28.8059485 R-squared = 0.2776
-------------+------------------------------ Adj R-squared = 0.2755
Total | 53354.8946 1342 39.7577456 Root MSE = 5.3671

wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
education | .8750694 .050243 17.42 0.000 .7765057 .973633
age | .1514818 .0192717 7.86 0.000 .1136757 .1892879
children | -.6862982 .1032256 -6.65 0.000 -.8887997 -.4837966
married | -.5395024 .3574519 -1.51 0.131 -1.24073 .1617247
_cons | 7.934369 .9264515 8.56 0.000 6.116914 9.751825

Peteke Feijten
Research and user support for the Scottish Longitudinal Study (
University of St Andrews
School of Geography & Geosciences
North Street, St Andrews, KY16 9AL
Phone: 01334 463951
The University of St Andrews is a charity registered in Scotland: No SC013532

* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index