Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

st: GMM minimization of regional errors imputed from hhd level model

 From Vladimír Hlásny <[email protected]> To [email protected] Subject st: GMM minimization of regional errors imputed from hhd level model Date Fri, 28 Jun 2013 23:05:23 +0900

```Hi,
I am using a method by Korinek, Mistiaen and Ravallion (2007) to
correct for unit-nonresponse bias. That involves estimating
response-probability for each household,  inferring regional
population from these probabilities, and fitting against actual
regional populations. I must use household-level data and region-level
data simultaneously, because coefficients in the household-level model
are adjusted based on fit of the regional-level populations.

I used a trick - manually resetting residuals of all but
one-per-region household - but this trick doesn't produce perfect
results. Please find the details, remaining problems, as well as the
Stata code described below. Any thoughts on this?

Thank you for any suggestions!

Ewha Womans University
Seoul, Korea

Details:
I am estimating households' probability to respond to a survey as a
function of their income. For each responding household (12000), I
have data on income. Also, at the level of region (3000), I know the
number of responding and non-responding households.

I declare a logit equation of response-probability as a function of
income, to estimate it for all responding households.

The identification is provided by fitting of population in each
region. For each responding household, I estimate their true mass as
the inverse of their response probability. Then I sum the
response-probabilities for all households in a region, and fit it
against the true population.

Stata problem:
I am estimating GMM at the regional level. But, to obtain the
population estimate in each region, I calculate response-probabilities
at the household level and sum them up in a region. This region-level
fitting and response-probability estimation occurs
simultaneously/iteratively -- as logit-coefficients are adjusted to
minimize region-level residuals, households response-probabilities
change.

Possible solution 1: Could I define region-level variables as a
matrix, and run GMM on this matrix? I don't know how.

Possible solution 2: a trick that works but not perfectly (refer to
the code below):
Inside of a GMM function-evaluator program, I declare a residual
`varlist' = (`y' - `yhat'). I manually multiply this residual by 1 for
the first household in a region, and by zero for all other households.
Stata runs GMM on all 12000 household-observations, but only 3000
residuals are non-zero.

Problems with this trick:
- Stata finishes solving too early, and the solution is imprecise
(because 9k residuals are zero and cannot be decreased any more).
- Message "flat or discontinuous region encountered" shows up many
times in Stata 12, sometimes in Stata 11, and the model doesn't
converge.
- I have to compute standard errors and other regressions statistics
manually using N=3000.

--
sort region
by region: gen oneiffirst = _n
replace oneiffirst = 0 if oneiffirst > 1

* GMM function-evaluator program*
program gmm_nonresp
version 11
syntax varlist if, at(name) mylhs(varlist) myrhs(varlist) myidvar(varlist)
quietly {
tempvar explinspec pophat
gen double `explinspec' = `at'[1,1] `if'
local j=2
foreach var of varlist `myrhs' {
replace `explinspec' = `explinspec' + `var'*`at'[1,`j'] `if'
local j = `j' + 1
}
* exponential of the linear specification for housheolds' response
probability is: *
replace `explinspec' = exp(`explinspec')

* predict household count in region as sum of households' inverse
response-probabilities *
egen double `pophat' = sum((1+`explinspec')/`explinspec') `if', by(`myidvar')

* FIT TRUE HOUSEHOLD-COUNT TO PREDICTED HOUSEHOLD-COUNT, AND WEIGHT *
* Trick to go from household-level response-probability estimation to
region-level fitting of populations: *
replace `varlist' = (`mylhs' - `pophat')*Winverse_sqrt*oneiffirst `if'
}
end

* RUN GMM MINIMIZATION WITH STARTING-VALUES *
gmm gmm_nonresp, mylhs(population) myrhs(x1) myidvar(region)
nequations(1) parameters(theta1 theta2) instruments(x1) from(theta1 10
theta2 -1)
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```