Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: GMM minimization of regional errors imputed from hhd level model |

Date |
Sun, 30 Jun 2013 23:37:08 -0400 |

Vladimír Hlásny <vhlasny@gmail.com>: I can't see that in your code: , myrhs(x1) instruments(x1) and myrhs gets multiplied by theta2, so it must be at the individual level. Perhaps you should follow the usual advice, and illustrate your problem using a publicly available dataset. On Sun, Jun 30, 2013 at 11:18 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: > Dear Austin: > Thanks for the link to optimize(). I will check whether that could > solve my 'region-level minimization' vs. 'household-level model' > problem. > Regarding your point: > What you call 'x1' is a function of all incomes in a region, not > income of a single household. > Vladimir > > > On Mon, Jul 1, 2013 at 11:10 AM, Austin Nichols <austinnichols@gmail.com> wrote: >> Vladimír Hlásny <vhlasny@gmail.com>, >> >> If you're not familiar with optimize(), start with the help file. Or >> just follow the link I sent. >> >> You don't seem to take my point about your trick; if you put all the >> weight of optimization on one residual per group, and -gmm- is trying >> to make that one residual orthogonal to an instrument x1=income, but >> you (unluckily) have x1=0 in each of those cases, then how could -gmm- >> possibly improve on residual times zero, equals zero? An unlucky case, >> but possible, given your syntax, I think. >> >> On Sun, Jun 30, 2013 at 10:02 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>> Dear Austin: >>> I am computing the "one-per-region residuals" as the difference >>> between regional actual population and predicted population (sum of >>> household-inverse-probabilities). So my trick doesn't depend on luck - >>> the residuals contain information on all households within a region. >>> >>> In the code that I pasted in my original email, notice the summation >>> across households: >>> egen double `pophat' = sum( (1+exp(b0+income*b1)) / exp(b0+income*b1)) >>> `if', by(`region') >>> replace residual = (pop - `pophat') * oneiffirst >>> >>> The 'oneiffirst' is a binary indicator for one residual per region, my >>> trick. By using that, I ensure that only one region-level residual is >>> considered per region. Instead, I would have liked to use an 'if' >>> statement (such as 'if oneiffirst'), so that Stata would know that >>> there are only 2500 (region-level) observations. But Stata doesn't >>> allow it. Is there another way to essentially restrict the sample >>> inside of the function evaluator program - the sample in which the >>> moments are evaluated - after GMM is called in a hhd-level dataset? >>> >>> I am not familiar with 'optimize()'. Will that let me declare samples >>> so that I estimate a region-level regression in which moments are >>> computed from a hhd-level equation? >>> Thank you. >>> Vladimir >>> >>> On Mon, Jul 1, 2013 at 1:17 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>> My question is: why try trick -gmm- into doing an optimization it's >>>> not designed for? You are trying to make the first residual within >>>> group orthogonal to income; what if you got unlucky and the first case >>>> in each group had zero income--hard to see how you could improve the >>>> objective function, right? >>>> >>>> Instead start with Mata's optimize() which can be used to roll your >>>> own GMM and much else besides: see e.g. >>>> http://www.stata.com/meeting/snasug08/nichols_gmm.pdf >>>> >>>> On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>> Dear Austin: >>>>> The model is definitely identified. Matlab runs the model well, >>>>> because I can use household-level and region-level variables >>>>> simultaneously. My trick in Stata also works, except that it produces >>>>> imprecise results and occasionally fails to converge. (My current >>>>> trick is to make Stata think that the model is at the household level, >>>>> and manually setting all-but-one-per-region hhd-level residuals to >>>>> zero.) >>>>> >>>>> Incomes of the responding households are my instrument. >>>>> Essentially, because each region has a different survey-response-rate >>>>> and different distribution of incomes of responding households, GMM >>>>> estimates the relationship between households' response-probability >>>>> and their income (subject to assumptions on representativeness of >>>>> responding households). >>>>> >>>>> In sum: >>>>> I need Stata to use region-level and household-level variables (or >>>>> matrices) simultaneously. Specifically, Stata must minimize >>>>> region-level residuals computed from a household-level logistic >>>>> equation. E.g., if I feed household-level data into the GMM >>>>> function-evaluator program, can I instruct the GMM to use only one >>>>> residual per region? >>>>> >>>>> Vladimir >>>>> >>>>> On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols >>>>> <austinnichols@gmail.com> wrote: >>>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>>> I have not read the ref. But you do not really have instruments. That >>>>>> is, you are not setting E(Ze) to zero with e a residual from some >>>>>> equation and Z your instrument; you do not have moments of that type. >>>>>> Seems you should start with optimize() instead of -gmm-, as you are >>>>>> just minimizing the sum of squared deviations from targets at the >>>>>> region level. Or am I still misunderstanding this exercise? >>>>>> >>>>>> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>>> Thanks for responding, Austin. >>>>>>> >>>>>>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An >>>>>>> econometric method of correcting for unit nonresponse bias in surveys, >>>>>>> J. of Econometrics 136. >>>>>>> >>>>>>> My sample includes 12000 responding households. I know their income, >>>>>>> and which of 2500 regions they come from. In addition, for each >>>>>>> region, I know the number of non-responding households. I find the >>>>>>> coefficient on income by fitting estimated regional population to >>>>>>> actual population: >>>>>>> >>>>>>> P_i = logit f(income_i,theta) >>>>>>> actual_j = responding_j + nonresponding_j >>>>>>> theta = argmin {sum(1/P_i) - actual_j} >>>>>>> >>>>>>> Response probability may not be monotonic in income. The logit may be >>>>>>> a non-monotonic function of income. >>>>>>> >>>>>>> Thanks for any thoughts on how to estimate this in Stata, or how to >>>>>>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to >>>>>>> zero) work better. >>>>>>> >>>>>>> Vladimir >>>>>>> >>>>>>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>>>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>>>>> As the FAQ hints, if you don't provide full references, don't expect >>>>>>>> good answers. >>>>>>>> >>>>>>>> I don't understand your description--how are you running a logit of >>>>>>>> response on income, when you only have income for responders? Can you >>>>>>>> give a sense of what the data looks like? >>>>>>>> >>>>>>>> On another topic, why would anyone expect response probability to be >>>>>>>> monotonic in income? >>>>>>>> >>>>>>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>>>>> Hi, >>>>>>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to >>>>>>>>> correct for unit-nonresponse bias. That involves estimating >>>>>>>>> response-probability for each household, inferring regional >>>>>>>>> population from these probabilities, and fitting against actual >>>>>>>>> regional populations. I must use household-level data and region-level >>>>>>>>> data simultaneously, because coefficients in the household-level model >>>>>>>>> are adjusted based on fit of the regional-level populations. >>>>>>>>> >>>>>>>>> I used a trick - manually resetting residuals of all but >>>>>>>>> one-per-region household - but this trick doesn't produce perfect >>>>>>>>> results. Please find the details, remaining problems, as well as the >>>>>>>>> Stata code described below. Any thoughts on this? >>>>>>>>> >>>>>>>>> Thank you for any suggestions! >>>>>>>>> >>>>>>>>> Vladimir Hlasny >>>>>>>>> Ewha Womans University >>>>>>>>> Seoul, Korea >>>>>>>>> >>>>>>>>> Details: >>>>>>>>> I am estimating households' probability to respond to a survey as a >>>>>>>>> function of their income. For each responding household (12000), I >>>>>>>>> have data on income. Also, at the level of region (3000), I know the >>>>>>>>> number of responding and non-responding households. >>>>>>>>> >>>>>>>>> I declare a logit equation of response-probability as a function of >>>>>>>>> income, to estimate it for all responding households. >>>>>>>>> >>>>>>>>> The identification is provided by fitting of population in each >>>>>>>>> region. For each responding household, I estimate their true mass as >>>>>>>>> the inverse of their response probability. Then I sum the >>>>>>>>> response-probabilities for all households in a region, and fit it >>>>>>>>> against the true population. >>>>>>>>> >>>>>>>>> Stata problem: >>>>>>>>> I am estimating GMM at the regional level. But, to obtain the >>>>>>>>> population estimate in each region, I calculate response-probabilities >>>>>>>>> at the household level and sum them up in a region. This region-level >>>>>>>>> fitting and response-probability estimation occurs >>>>>>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to >>>>>>>>> minimize region-level residuals, households response-probabilities >>>>>>>>> change. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**References**:**st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

- Prev by Date:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Next by Date:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Previous by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Next by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Index(es):