Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: GMM minimization of regional errors imputed from hhd level model |

Date |
Sun, 30 Jun 2013 12:17:14 -0400 |

Vladimír Hlásny <vhlasny@gmail.com>: My question is: why try trick -gmm- into doing an optimization it's not designed for? You are trying to make the first residual within group orthogonal to income; what if you got unlucky and the first case in each group had zero income--hard to see how you could improve the objective function, right? Instead start with Mata's optimize() which can be used to roll your own GMM and much else besides: see e.g. http://www.stata.com/meeting/snasug08/nichols_gmm.pdf On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: > Dear Austin: > The model is definitely identified. Matlab runs the model well, > because I can use household-level and region-level variables > simultaneously. My trick in Stata also works, except that it produces > imprecise results and occasionally fails to converge. (My current > trick is to make Stata think that the model is at the household level, > and manually setting all-but-one-per-region hhd-level residuals to > zero.) > > Incomes of the responding households are my instrument. > Essentially, because each region has a different survey-response-rate > and different distribution of incomes of responding households, GMM > estimates the relationship between households' response-probability > and their income (subject to assumptions on representativeness of > responding households). > > In sum: > I need Stata to use region-level and household-level variables (or > matrices) simultaneously. Specifically, Stata must minimize > region-level residuals computed from a household-level logistic > equation. E.g., if I feed household-level data into the GMM > function-evaluator program, can I instruct the GMM to use only one > residual per region? > > Vladimir > > On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols > <austinnichols@gmail.com> wrote: >> Vladimír Hlásny <vhlasny@gmail.com>: >> I have not read the ref. But you do not really have instruments. That >> is, you are not setting E(Ze) to zero with e a residual from some >> equation and Z your instrument; you do not have moments of that type. >> Seems you should start with optimize() instead of -gmm-, as you are >> just minimizing the sum of squared deviations from targets at the >> region level. Or am I still misunderstanding this exercise? >> >> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>> Thanks for responding, Austin. >>> >>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An >>> econometric method of correcting for unit nonresponse bias in surveys, >>> J. of Econometrics 136. >>> >>> My sample includes 12000 responding households. I know their income, >>> and which of 2500 regions they come from. In addition, for each >>> region, I know the number of non-responding households. I find the >>> coefficient on income by fitting estimated regional population to >>> actual population: >>> >>> P_i = logit f(income_i,theta) >>> actual_j = responding_j + nonresponding_j >>> theta = argmin {sum(1/P_i) - actual_j} >>> >>> Response probability may not be monotonic in income. The logit may be >>> a non-monotonic function of income. >>> >>> Thanks for any thoughts on how to estimate this in Stata, or how to >>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to >>> zero) work better. >>> >>> Vladimir >>> >>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>> As the FAQ hints, if you don't provide full references, don't expect >>>> good answers. >>>> >>>> I don't understand your description--how are you running a logit of >>>> response on income, when you only have income for responders? Can you >>>> give a sense of what the data looks like? >>>> >>>> On another topic, why would anyone expect response probability to be >>>> monotonic in income? >>>> >>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>> Hi, >>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to >>>>> correct for unit-nonresponse bias. That involves estimating >>>>> response-probability for each household, inferring regional >>>>> population from these probabilities, and fitting against actual >>>>> regional populations. I must use household-level data and region-level >>>>> data simultaneously, because coefficients in the household-level model >>>>> are adjusted based on fit of the regional-level populations. >>>>> >>>>> I used a trick - manually resetting residuals of all but >>>>> one-per-region household - but this trick doesn't produce perfect >>>>> results. Please find the details, remaining problems, as well as the >>>>> Stata code described below. Any thoughts on this? >>>>> >>>>> Thank you for any suggestions! >>>>> >>>>> Vladimir Hlasny >>>>> Ewha Womans University >>>>> Seoul, Korea >>>>> >>>>> Details: >>>>> I am estimating households' probability to respond to a survey as a >>>>> function of their income. For each responding household (12000), I >>>>> have data on income. Also, at the level of region (3000), I know the >>>>> number of responding and non-responding households. >>>>> >>>>> I declare a logit equation of response-probability as a function of >>>>> income, to estimate it for all responding households. >>>>> >>>>> The identification is provided by fitting of population in each >>>>> region. For each responding household, I estimate their true mass as >>>>> the inverse of their response probability. Then I sum the >>>>> response-probabilities for all households in a region, and fit it >>>>> against the true population. >>>>> >>>>> Stata problem: >>>>> I am estimating GMM at the regional level. But, to obtain the >>>>> population estimate in each region, I calculate response-probabilities >>>>> at the household level and sum them up in a region. This region-level >>>>> fitting and response-probability estimation occurs >>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to >>>>> minimize region-level residuals, households response-probabilities >>>>> change. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**References**:**st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

- Prev by Date:
**Re: st: RE: Can't store estimates after logistic regression** - Next by Date:
**Re: st: RE: Can't store estimates after logistic regression** - Previous by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Next by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Index(es):