Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: GMM minimization of regional errors imputed from hhd level model

 From Vladimír Hlásny To statalist@hsphsun2.harvard.edu Subject Re: st: GMM minimization of regional errors imputed from hhd level model Date Sun, 30 Jun 2013 11:10:39 +0900

```Dear Austin:
The model is definitely identified. Matlab runs the model well,
because I can use household-level and region-level variables
simultaneously. My trick in Stata also works, except that it produces
imprecise results and occasionally fails to converge. (My current
trick is to make Stata think that the model is at the household level,
and manually setting all-but-one-per-region hhd-level residuals to
zero.)

Incomes of the responding households are my instrument.
Essentially, because each region has a different survey-response-rate
and different distribution of incomes of responding households, GMM
estimates the relationship between households' response-probability
and their income (subject to assumptions on representativeness of
responding households).

In sum:
I need Stata to use region-level and household-level variables (or
matrices) simultaneously. Specifically, Stata must minimize
region-level residuals computed from a household-level logistic
equation. E.g., if I feed household-level data into the GMM
function-evaluator program, can I instruct the GMM to use only one
residual per region?

On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols
<austinnichols@gmail.com> wrote:
> I have not read the ref.  But you do not really have instruments. That
> is, you are not setting E(Ze) to zero with e a residual from some
> equation and Z your instrument; you do not have moments of that type.
> just minimizing the sum of squared deviations from targets at the
> region level. Or am I still misunderstanding this exercise?
>
> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>> Thanks for responding, Austin.
>>
>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An
>> econometric method of correcting for unit nonresponse bias in surveys,
>> J. of Econometrics 136.
>>
>> My sample includes 12000 responding households. I know their income,
>> and which of 2500 regions they come from. In addition, for each
>> region, I know the number of non-responding households. I find the
>> coefficient on income by fitting estimated regional population to
>> actual population:
>>
>> P_i = logit f(income_i,theta)
>> actual_j = responding_j + nonresponding_j
>> theta = argmin {sum(1/P_i) - actual_j}
>>
>> Response probability may not be monotonic in income. The logit may be
>> a non-monotonic function of income.
>>
>> Thanks for any thoughts on how to estimate this in Stata, or how to
>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to
>> zero) work better.
>>
>>
>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote:
>>> As the FAQ hints, if you don't provide full references, don't expect
>>>
>>> I don't understand your description--how are you running a logit of
>>> response on income, when you only have income for responders?  Can you
>>> give a sense of what the data looks like?
>>>
>>> On another topic, why would anyone expect response probability to be
>>> monotonic in income?
>>>
>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>>>> Hi,
>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to
>>>> correct for unit-nonresponse bias. That involves estimating
>>>> response-probability for each household,  inferring regional
>>>> population from these probabilities, and fitting against actual
>>>> regional populations. I must use household-level data and region-level
>>>> data simultaneously, because coefficients in the household-level model
>>>> are adjusted based on fit of the regional-level populations.
>>>>
>>>> I used a trick - manually resetting residuals of all but
>>>> one-per-region household - but this trick doesn't produce perfect
>>>> results. Please find the details, remaining problems, as well as the
>>>> Stata code described below. Any thoughts on this?
>>>>
>>>> Thank you for any suggestions!
>>>>
>>>> Ewha Womans University
>>>> Seoul, Korea
>>>>
>>>> Details:
>>>> I am estimating households' probability to respond to a survey as a
>>>> function of their income. For each responding household (12000), I
>>>> have data on income. Also, at the level of region (3000), I know the
>>>> number of responding and non-responding households.
>>>>
>>>> I declare a logit equation of response-probability as a function of
>>>> income, to estimate it for all responding households.
>>>>
>>>> The identification is provided by fitting of population in each
>>>> region. For each responding household, I estimate their true mass as
>>>> the inverse of their response probability. Then I sum the
>>>> response-probabilities for all households in a region, and fit it
>>>> against the true population.
>>>>
>>>> Stata problem:
>>>> I am estimating GMM at the regional level. But, to obtain the
>>>> population estimate in each region, I calculate response-probabilities
>>>> at the household level and sum them up in a region. This region-level
>>>> fitting and response-probability estimation occurs
>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to
>>>> minimize region-level residuals, households response-probabilities
>>>> change.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```