Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: GMM minimization of regional errors imputed from hhd level model

From   Austin Nichols <>
Subject   Re: st: GMM minimization of regional errors imputed from hhd level model
Date   Sun, 30 Jun 2013 22:10:43 -0400

Vladimír Hlásny <>,

If you're not familiar with optimize(), start with the help file. Or
just follow the link I sent.

You don't seem to take my point about your trick; if you put all the
weight of optimization on one residual per group, and -gmm- is trying
to make that one residual orthogonal to an instrument x1=income, but
you (unluckily) have x1=0 in each of those cases, then how could -gmm-
possibly improve on residual times zero, equals zero? An unlucky case,
but possible, given your syntax, I think.

On Sun, Jun 30, 2013 at 10:02 PM, Vladimír Hlásny <> wrote:
> Dear Austin:
> I am computing the "one-per-region residuals" as the difference
> between regional actual population and predicted population (sum of
> household-inverse-probabilities). So my trick doesn't depend on luck -
> the residuals contain information on all households within a region.
> In the code that I pasted in my original email, notice the summation
> across households:
> egen double `pophat' = sum( (1+exp(b0+income*b1)) / exp(b0+income*b1))
> `if', by(`region')
> replace residual = (pop - `pophat') * oneiffirst
> The 'oneiffirst' is a binary indicator for one residual per region, my
> trick. By using that, I ensure that only one region-level residual is
> considered per region. Instead, I would have liked to use an 'if'
> statement (such as 'if oneiffirst'), so that Stata would know that
> there are only 2500 (region-level) observations. But Stata doesn't
> allow it. Is there another way to essentially restrict the sample
> inside of the function evaluator program - the sample in which the
> moments are evaluated - after GMM is called in a hhd-level dataset?
> I am not familiar with 'optimize()'. Will that let me declare samples
> so that I estimate a region-level regression in which moments are
> computed from a hhd-level equation?
> Thank you.
> Vladimir
> On Mon, Jul 1, 2013 at 1:17 AM, Austin Nichols <> wrote:
>> Vladimír Hlásny <>:
>> My question is: why try trick -gmm- into doing an optimization it's
>> not designed for? You are trying to make the first residual within
>> group orthogonal to income; what if you got unlucky and the first case
>> in each group had zero income--hard to see how you could improve the
>> objective function, right?
>> Instead start with Mata's optimize() which can be used to roll your
>> own GMM and much else besides: see e.g.
>> On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <> wrote:
>>> Dear Austin:
>>> The model is definitely identified. Matlab runs the model well,
>>> because I can use household-level and region-level variables
>>> simultaneously. My trick in Stata also works, except that it produces
>>> imprecise results and occasionally fails to converge. (My current
>>> trick is to make Stata think that the model is at the household level,
>>> and manually setting all-but-one-per-region hhd-level residuals to
>>> zero.)
>>> Incomes of the responding households are my instrument.
>>> Essentially, because each region has a different survey-response-rate
>>> and different distribution of incomes of responding households, GMM
>>> estimates the relationship between households' response-probability
>>> and their income (subject to assumptions on representativeness of
>>> responding households).
>>> In sum:
>>> I need Stata to use region-level and household-level variables (or
>>> matrices) simultaneously. Specifically, Stata must minimize
>>> region-level residuals computed from a household-level logistic
>>> equation. E.g., if I feed household-level data into the GMM
>>> function-evaluator program, can I instruct the GMM to use only one
>>> residual per region?
>>> Vladimir
>>> On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols
>>> <> wrote:
>>>> Vladimír Hlásny <>:
>>>> I have not read the ref.  But you do not really have instruments. That
>>>> is, you are not setting E(Ze) to zero with e a residual from some
>>>> equation and Z your instrument; you do not have moments of that type.
>>>> Seems you should start with optimize() instead of -gmm-, as you are
>>>> just minimizing the sum of squared deviations from targets at the
>>>> region level. Or am I still misunderstanding this exercise?
>>>> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <> wrote:
>>>>> Thanks for responding, Austin.
>>>>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An
>>>>> econometric method of correcting for unit nonresponse bias in surveys,
>>>>> J. of Econometrics 136.
>>>>> My sample includes 12000 responding households. I know their income,
>>>>> and which of 2500 regions they come from. In addition, for each
>>>>> region, I know the number of non-responding households. I find the
>>>>> coefficient on income by fitting estimated regional population to
>>>>> actual population:
>>>>> P_i = logit f(income_i,theta)
>>>>> actual_j = responding_j + nonresponding_j
>>>>> theta = argmin {sum(1/P_i) - actual_j}
>>>>> Response probability may not be monotonic in income. The logit may be
>>>>> a non-monotonic function of income.
>>>>> Thanks for any thoughts on how to estimate this in Stata, or how to
>>>>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to
>>>>> zero) work better.
>>>>> Vladimir
>>>>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <> wrote:
>>>>>> Vladimír Hlásny <>:
>>>>>> As the FAQ hints, if you don't provide full references, don't expect
>>>>>> good answers.
>>>>>> I don't understand your description--how are you running a logit of
>>>>>> response on income, when you only have income for responders?  Can you
>>>>>> give a sense of what the data looks like?
>>>>>> On another topic, why would anyone expect response probability to be
>>>>>> monotonic in income?
>>>>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <> wrote:
>>>>>>> Hi,
>>>>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to
>>>>>>> correct for unit-nonresponse bias. That involves estimating
>>>>>>> response-probability for each household,  inferring regional
>>>>>>> population from these probabilities, and fitting against actual
>>>>>>> regional populations. I must use household-level data and region-level
>>>>>>> data simultaneously, because coefficients in the household-level model
>>>>>>> are adjusted based on fit of the regional-level populations.
>>>>>>> I used a trick - manually resetting residuals of all but
>>>>>>> one-per-region household - but this trick doesn't produce perfect
>>>>>>> results. Please find the details, remaining problems, as well as the
>>>>>>> Stata code described below. Any thoughts on this?
>>>>>>> Thank you for any suggestions!
>>>>>>> Vladimir Hlasny
>>>>>>> Ewha Womans University
>>>>>>> Seoul, Korea
>>>>>>> Details:
>>>>>>> I am estimating households' probability to respond to a survey as a
>>>>>>> function of their income. For each responding household (12000), I
>>>>>>> have data on income. Also, at the level of region (3000), I know the
>>>>>>> number of responding and non-responding households.
>>>>>>> I declare a logit equation of response-probability as a function of
>>>>>>> income, to estimate it for all responding households.
>>>>>>> The identification is provided by fitting of population in each
>>>>>>> region. For each responding household, I estimate their true mass as
>>>>>>> the inverse of their response probability. Then I sum the
>>>>>>> response-probabilities for all households in a region, and fit it
>>>>>>> against the true population.
>>>>>>> Stata problem:
>>>>>>> I am estimating GMM at the regional level. But, to obtain the
>>>>>>> population estimate in each region, I calculate response-probabilities
>>>>>>> at the household level and sum them up in a region. This region-level
>>>>>>> fitting and response-probability estimation occurs
>>>>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to
>>>>>>> minimize region-level residuals, households response-probabilities
>>>>>>> change.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index