Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Vladimír Hlásny <vhlasny@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: GMM minimization of regional errors imputed from hhd level model |

Date |
Mon, 1 Jul 2013 11:02:24 +0900 |

Dear Austin: I am computing the "one-per-region residuals" as the difference between regional actual population and predicted population (sum of household-inverse-probabilities). So my trick doesn't depend on luck - the residuals contain information on all households within a region. In the code that I pasted in my original email, notice the summation across households: egen double `pophat' = sum( (1+exp(b0+income*b1)) / exp(b0+income*b1)) `if', by(`region') replace residual = (pop - `pophat') * oneiffirst The 'oneiffirst' is a binary indicator for one residual per region, my trick. By using that, I ensure that only one region-level residual is considered per region. Instead, I would have liked to use an 'if' statement (such as 'if oneiffirst'), so that Stata would know that there are only 2500 (region-level) observations. But Stata doesn't allow it. Is there another way to essentially restrict the sample inside of the function evaluator program - the sample in which the moments are evaluated - after GMM is called in a hhd-level dataset? I am not familiar with 'optimize()'. Will that let me declare samples so that I estimate a region-level regression in which moments are computed from a hhd-level equation? Thank you. Vladimir On Mon, Jul 1, 2013 at 1:17 AM, Austin Nichols <austinnichols@gmail.com> wrote: > Vladimír Hlásny <vhlasny@gmail.com>: > My question is: why try trick -gmm- into doing an optimization it's > not designed for? You are trying to make the first residual within > group orthogonal to income; what if you got unlucky and the first case > in each group had zero income--hard to see how you could improve the > objective function, right? > > Instead start with Mata's optimize() which can be used to roll your > own GMM and much else besides: see e.g. > http://www.stata.com/meeting/snasug08/nichols_gmm.pdf > > On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >> Dear Austin: >> The model is definitely identified. Matlab runs the model well, >> because I can use household-level and region-level variables >> simultaneously. My trick in Stata also works, except that it produces >> imprecise results and occasionally fails to converge. (My current >> trick is to make Stata think that the model is at the household level, >> and manually setting all-but-one-per-region hhd-level residuals to >> zero.) >> >> Incomes of the responding households are my instrument. >> Essentially, because each region has a different survey-response-rate >> and different distribution of incomes of responding households, GMM >> estimates the relationship between households' response-probability >> and their income (subject to assumptions on representativeness of >> responding households). >> >> In sum: >> I need Stata to use region-level and household-level variables (or >> matrices) simultaneously. Specifically, Stata must minimize >> region-level residuals computed from a household-level logistic >> equation. E.g., if I feed household-level data into the GMM >> function-evaluator program, can I instruct the GMM to use only one >> residual per region? >> >> Vladimir >> >> On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols >> <austinnichols@gmail.com> wrote: >>> Vladimír Hlásny <vhlasny@gmail.com>: >>> I have not read the ref. But you do not really have instruments. That >>> is, you are not setting E(Ze) to zero with e a residual from some >>> equation and Z your instrument; you do not have moments of that type. >>> Seems you should start with optimize() instead of -gmm-, as you are >>> just minimizing the sum of squared deviations from targets at the >>> region level. Or am I still misunderstanding this exercise? >>> >>> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>> Thanks for responding, Austin. >>>> >>>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An >>>> econometric method of correcting for unit nonresponse bias in surveys, >>>> J. of Econometrics 136. >>>> >>>> My sample includes 12000 responding households. I know their income, >>>> and which of 2500 regions they come from. In addition, for each >>>> region, I know the number of non-responding households. I find the >>>> coefficient on income by fitting estimated regional population to >>>> actual population: >>>> >>>> P_i = logit f(income_i,theta) >>>> actual_j = responding_j + nonresponding_j >>>> theta = argmin {sum(1/P_i) - actual_j} >>>> >>>> Response probability may not be monotonic in income. The logit may be >>>> a non-monotonic function of income. >>>> >>>> Thanks for any thoughts on how to estimate this in Stata, or how to >>>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to >>>> zero) work better. >>>> >>>> Vladimir >>>> >>>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>> As the FAQ hints, if you don't provide full references, don't expect >>>>> good answers. >>>>> >>>>> I don't understand your description--how are you running a logit of >>>>> response on income, when you only have income for responders? Can you >>>>> give a sense of what the data looks like? >>>>> >>>>> On another topic, why would anyone expect response probability to be >>>>> monotonic in income? >>>>> >>>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>> Hi, >>>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to >>>>>> correct for unit-nonresponse bias. That involves estimating >>>>>> response-probability for each household, inferring regional >>>>>> population from these probabilities, and fitting against actual >>>>>> regional populations. I must use household-level data and region-level >>>>>> data simultaneously, because coefficients in the household-level model >>>>>> are adjusted based on fit of the regional-level populations. >>>>>> >>>>>> I used a trick - manually resetting residuals of all but >>>>>> one-per-region household - but this trick doesn't produce perfect >>>>>> results. Please find the details, remaining problems, as well as the >>>>>> Stata code described below. Any thoughts on this? >>>>>> >>>>>> Thank you for any suggestions! >>>>>> >>>>>> Vladimir Hlasny >>>>>> Ewha Womans University >>>>>> Seoul, Korea >>>>>> >>>>>> Details: >>>>>> I am estimating households' probability to respond to a survey as a >>>>>> function of their income. For each responding household (12000), I >>>>>> have data on income. Also, at the level of region (3000), I know the >>>>>> number of responding and non-responding households. >>>>>> >>>>>> I declare a logit equation of response-probability as a function of >>>>>> income, to estimate it for all responding households. >>>>>> >>>>>> The identification is provided by fitting of population in each >>>>>> region. For each responding household, I estimate their true mass as >>>>>> the inverse of their response probability. Then I sum the >>>>>> response-probabilities for all households in a region, and fit it >>>>>> against the true population. >>>>>> >>>>>> Stata problem: >>>>>> I am estimating GMM at the regional level. But, to obtain the >>>>>> population estimate in each region, I calculate response-probabilities >>>>>> at the household level and sum them up in a region. This region-level >>>>>> fitting and response-probability estimation occurs >>>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to >>>>>> minimize region-level residuals, households response-probabilities >>>>>> change. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**References**:**st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**st: Population attributable fractions (PAFs) in discrete-time survival analysis. -punaf-** - Next by Date:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Previous by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Next by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Index(es):