Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: GMM minimization of regional errors imputed from hhd level model

 From Vladimír Hlásny To statalist@hsphsun2.harvard.edu Subject Re: st: GMM minimization of regional errors imputed from hhd level model Date Mon, 1 Jul 2013 11:02:24 +0900

```Dear Austin:
I am computing the "one-per-region residuals" as the difference
between regional actual population and predicted population (sum of
household-inverse-probabilities). So my trick doesn't depend on luck -
the residuals contain information on all households within a region.

In the code that I pasted in my original email, notice the summation
across households:
egen double `pophat' = sum( (1+exp(b0+income*b1)) / exp(b0+income*b1))
`if', by(`region')
replace residual = (pop - `pophat') * oneiffirst

The 'oneiffirst' is a binary indicator for one residual per region, my
trick. By using that, I ensure that only one region-level residual is
considered per region. Instead, I would have liked to use an 'if'
statement (such as 'if oneiffirst'), so that Stata would know that
there are only 2500 (region-level) observations. But Stata doesn't
allow it. Is there another way to essentially restrict the sample
inside of the function evaluator program - the sample in which the
moments are evaluated - after GMM is called in a hhd-level dataset?

I am not familiar with 'optimize()'. Will that let me declare samples
so that I estimate a region-level regression in which moments are
computed from a hhd-level equation?
Thank you.

On Mon, Jul 1, 2013 at 1:17 AM, Austin Nichols <austinnichols@gmail.com> wrote:
> My question is: why try trick -gmm- into doing an optimization it's
> not designed for? You are trying to make the first residual within
> group orthogonal to income; what if you got unlucky and the first case
> in each group had zero income--hard to see how you could improve the
> objective function, right?
>
> own GMM and much else besides: see e.g.
> http://www.stata.com/meeting/snasug08/nichols_gmm.pdf
>
> On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>> Dear Austin:
>> The model is definitely identified. Matlab runs the model well,
>> because I can use household-level and region-level variables
>> simultaneously. My trick in Stata also works, except that it produces
>> imprecise results and occasionally fails to converge. (My current
>> trick is to make Stata think that the model is at the household level,
>> and manually setting all-but-one-per-region hhd-level residuals to
>> zero.)
>>
>> Incomes of the responding households are my instrument.
>> Essentially, because each region has a different survey-response-rate
>> and different distribution of incomes of responding households, GMM
>> estimates the relationship between households' response-probability
>> and their income (subject to assumptions on representativeness of
>> responding households).
>>
>> In sum:
>> I need Stata to use region-level and household-level variables (or
>> matrices) simultaneously. Specifically, Stata must minimize
>> region-level residuals computed from a household-level logistic
>> equation. E.g., if I feed household-level data into the GMM
>> function-evaluator program, can I instruct the GMM to use only one
>> residual per region?
>>
>>
>> On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols
>> <austinnichols@gmail.com> wrote:
>>> I have not read the ref.  But you do not really have instruments. That
>>> is, you are not setting E(Ze) to zero with e a residual from some
>>> equation and Z your instrument; you do not have moments of that type.
>>> just minimizing the sum of squared deviations from targets at the
>>> region level. Or am I still misunderstanding this exercise?
>>>
>>> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>>>> Thanks for responding, Austin.
>>>>
>>>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An
>>>> econometric method of correcting for unit nonresponse bias in surveys,
>>>> J. of Econometrics 136.
>>>>
>>>> My sample includes 12000 responding households. I know their income,
>>>> and which of 2500 regions they come from. In addition, for each
>>>> region, I know the number of non-responding households. I find the
>>>> coefficient on income by fitting estimated regional population to
>>>> actual population:
>>>>
>>>> P_i = logit f(income_i,theta)
>>>> actual_j = responding_j + nonresponding_j
>>>> theta = argmin {sum(1/P_i) - actual_j}
>>>>
>>>> Response probability may not be monotonic in income. The logit may be
>>>> a non-monotonic function of income.
>>>>
>>>> Thanks for any thoughts on how to estimate this in Stata, or how to
>>>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to
>>>> zero) work better.
>>>>
>>>>
>>>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote:
>>>>> As the FAQ hints, if you don't provide full references, don't expect
>>>>>
>>>>> I don't understand your description--how are you running a logit of
>>>>> response on income, when you only have income for responders?  Can you
>>>>> give a sense of what the data looks like?
>>>>>
>>>>> On another topic, why would anyone expect response probability to be
>>>>> monotonic in income?
>>>>>
>>>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>>>>>> Hi,
>>>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to
>>>>>> correct for unit-nonresponse bias. That involves estimating
>>>>>> response-probability for each household,  inferring regional
>>>>>> population from these probabilities, and fitting against actual
>>>>>> regional populations. I must use household-level data and region-level
>>>>>> data simultaneously, because coefficients in the household-level model
>>>>>> are adjusted based on fit of the regional-level populations.
>>>>>>
>>>>>> I used a trick - manually resetting residuals of all but
>>>>>> one-per-region household - but this trick doesn't produce perfect
>>>>>> results. Please find the details, remaining problems, as well as the
>>>>>> Stata code described below. Any thoughts on this?
>>>>>>
>>>>>> Thank you for any suggestions!
>>>>>>
>>>>>> Ewha Womans University
>>>>>> Seoul, Korea
>>>>>>
>>>>>> Details:
>>>>>> I am estimating households' probability to respond to a survey as a
>>>>>> function of their income. For each responding household (12000), I
>>>>>> have data on income. Also, at the level of region (3000), I know the
>>>>>> number of responding and non-responding households.
>>>>>>
>>>>>> I declare a logit equation of response-probability as a function of
>>>>>> income, to estimate it for all responding households.
>>>>>>
>>>>>> The identification is provided by fitting of population in each
>>>>>> region. For each responding household, I estimate their true mass as
>>>>>> the inverse of their response probability. Then I sum the
>>>>>> response-probabilities for all households in a region, and fit it
>>>>>> against the true population.
>>>>>>
>>>>>> Stata problem:
>>>>>> I am estimating GMM at the regional level. But, to obtain the
>>>>>> population estimate in each region, I calculate response-probabilities
>>>>>> at the household level and sum them up in a region. This region-level
>>>>>> fitting and response-probability estimation occurs
>>>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to
>>>>>> minimize region-level residuals, households response-probabilities
>>>>>> change.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```