Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Vladimír Hlásny <vhlasny@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: GMM minimization of regional errors imputed from hhd level model |

Date |
Fri, 28 Jun 2013 23:05:23 +0900 |

Hi, I am using a method by Korinek, Mistiaen and Ravallion (2007) to correct for unit-nonresponse bias. That involves estimating response-probability for each household, inferring regional population from these probabilities, and fitting against actual regional populations. I must use household-level data and region-level data simultaneously, because coefficients in the household-level model are adjusted based on fit of the regional-level populations. I used a trick - manually resetting residuals of all but one-per-region household - but this trick doesn't produce perfect results. Please find the details, remaining problems, as well as the Stata code described below. Any thoughts on this? Thank you for any suggestions! Vladimir Hlasny Ewha Womans University Seoul, Korea Details: I am estimating households' probability to respond to a survey as a function of their income. For each responding household (12000), I have data on income. Also, at the level of region (3000), I know the number of responding and non-responding households. I declare a logit equation of response-probability as a function of income, to estimate it for all responding households. The identification is provided by fitting of population in each region. For each responding household, I estimate their true mass as the inverse of their response probability. Then I sum the response-probabilities for all households in a region, and fit it against the true population. Stata problem: I am estimating GMM at the regional level. But, to obtain the population estimate in each region, I calculate response-probabilities at the household level and sum them up in a region. This region-level fitting and response-probability estimation occurs simultaneously/iteratively -- as logit-coefficients are adjusted to minimize region-level residuals, households response-probabilities change. Possible solution 1: Could I define region-level variables as a matrix, and run GMM on this matrix? I don't know how. Possible solution 2: a trick that works but not perfectly (refer to the code below): Inside of a GMM function-evaluator program, I declare a residual `varlist' = (`y' - `yhat'). I manually multiply this residual by 1 for the first household in a region, and by zero for all other households. Stata runs GMM on all 12000 household-observations, but only 3000 residuals are non-zero. Problems with this trick: - Stata finishes solving too early, and the solution is imprecise (because 9k residuals are zero and cannot be decreased any more). - Message "flat or discontinuous region encountered" shows up many times in Stata 12, sometimes in Stata 11, and the model doesn't converge. - I have to compute standard errors and other regressions statistics manually using N=3000. -- sort region by region: gen oneiffirst = _n replace oneiffirst = 0 if oneiffirst > 1 * GMM function-evaluator program* program gmm_nonresp version 11 syntax varlist if, at(name) mylhs(varlist) myrhs(varlist) myidvar(varlist) quietly { tempvar explinspec pophat gen double `explinspec' = `at'[1,1] `if' local j=2 foreach var of varlist `myrhs' { replace `explinspec' = `explinspec' + `var'*`at'[1,`j'] `if' local j = `j' + 1 } * exponential of the linear specification for housheolds' response probability is: * replace `explinspec' = exp(`explinspec') * predict household count in region as sum of households' inverse response-probabilities * egen double `pophat' = sum((1+`explinspec')/`explinspec') `if', by(`myidvar') * FIT TRUE HOUSEHOLD-COUNT TO PREDICTED HOUSEHOLD-COUNT, AND WEIGHT * * Trick to go from household-level response-probability estimation to region-level fitting of populations: * replace `varlist' = (`mylhs' - `pophat')*Winverse_sqrt*oneiffirst `if' } end * RUN GMM MINIMIZATION WITH STARTING-VALUES * gmm gmm_nonresp, mylhs(population) myrhs(x1) myidvar(region) nequations(1) parameters(theta1 theta2) instruments(x1) from(theta1 10 theta2 -1) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: centile in a tabstst** - Next by Date:
**Re: st: Stata 13 illustrative example data file** - Previous by thread:
**st: drop observations within groups under certain conditions** - Next by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Index(es):