Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: GMM minimization of regional errors imputed from hhd level model

From   Vladimír Hlásny <>
Subject   Re: st: GMM minimization of regional errors imputed from hhd level model
Date   Sat, 29 Jun 2013 11:08:47 +0900

Thanks for responding, Austin.

The full reference is: Korinek, Mistiaen and Ravallion (2007), An
econometric method of correcting for unit nonresponse bias in surveys,
J. of Econometrics 136.

My sample includes 12000 responding households. I know their income,
and which of 2500 regions they come from. In addition, for each
region, I know the number of non-responding households. I find the
coefficient on income by fitting estimated regional population to
actual population:

P_i = logit f(income_i,theta)
actual_j = responding_j + nonresponding_j
theta = argmin {sum(1/P_i) - actual_j}

Response probability may not be monotonic in income. The logit may be
a non-monotonic function of income.

Thanks for any thoughts on how to estimate this in Stata, or how to
make my 'trick' (setting 12000-2500 hhd-level residuals manually to
zero) work better.


On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <> wrote:
> Vladimír Hlásny <>:
> As the FAQ hints, if you don't provide full references, don't expect
> good answers.
> I don't understand your description--how are you running a logit of
> response on income, when you only have income for responders?  Can you
> give a sense of what the data looks like?
> On another topic, why would anyone expect response probability to be
> monotonic in income?
> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <> wrote:
>> Hi,
>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to
>> correct for unit-nonresponse bias. That involves estimating
>> response-probability for each household,  inferring regional
>> population from these probabilities, and fitting against actual
>> regional populations. I must use household-level data and region-level
>> data simultaneously, because coefficients in the household-level model
>> are adjusted based on fit of the regional-level populations.
>> I used a trick - manually resetting residuals of all but
>> one-per-region household - but this trick doesn't produce perfect
>> results. Please find the details, remaining problems, as well as the
>> Stata code described below. Any thoughts on this?
>> Thank you for any suggestions!
>> Vladimir Hlasny
>> Ewha Womans University
>> Seoul, Korea
>> Details:
>> I am estimating households' probability to respond to a survey as a
>> function of their income. For each responding household (12000), I
>> have data on income. Also, at the level of region (3000), I know the
>> number of responding and non-responding households.
>> I declare a logit equation of response-probability as a function of
>> income, to estimate it for all responding households.
>> The identification is provided by fitting of population in each
>> region. For each responding household, I estimate their true mass as
>> the inverse of their response probability. Then I sum the
>> response-probabilities for all households in a region, and fit it
>> against the true population.
>> Stata problem:
>> I am estimating GMM at the regional level. But, to obtain the
>> population estimate in each region, I calculate response-probabilities
>> at the household level and sum them up in a region. This region-level
>> fitting and response-probability estimation occurs
>> simultaneously/iteratively -- as logit-coefficients are adjusted to
>> minimize region-level residuals, households response-probabilities
>> change.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index