Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: GMM minimization of regional errors imputed from hhd level model


From   Vladimír Hlásny <vhlasny@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: GMM minimization of regional errors imputed from hhd level model
Date   Mon, 1 Jul 2013 12:18:05 +0900

Dear Austin:
Thanks for the link to optimize(). I will check whether that could
solve my 'region-level minimization' vs. 'household-level model'
problem.
Regarding your point:
What you call 'x1' is a function of all incomes in a region, not
income of a single household.
Vladimir


On Mon, Jul 1, 2013 at 11:10 AM, Austin Nichols <austinnichols@gmail.com> wrote:
> Vladimír Hlásny <vhlasny@gmail.com>,
>
> If you're not familiar with optimize(), start with the help file. Or
> just follow the link I sent.
>
> You don't seem to take my point about your trick; if you put all the
> weight of optimization on one residual per group, and -gmm- is trying
> to make that one residual orthogonal to an instrument x1=income, but
> you (unluckily) have x1=0 in each of those cases, then how could -gmm-
> possibly improve on residual times zero, equals zero? An unlucky case,
> but possible, given your syntax, I think.
>
> On Sun, Jun 30, 2013 at 10:02 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>> Dear Austin:
>> I am computing the "one-per-region residuals" as the difference
>> between regional actual population and predicted population (sum of
>> household-inverse-probabilities). So my trick doesn't depend on luck -
>> the residuals contain information on all households within a region.
>>
>> In the code that I pasted in my original email, notice the summation
>> across households:
>> egen double `pophat' = sum( (1+exp(b0+income*b1)) / exp(b0+income*b1))
>> `if', by(`region')
>> replace residual = (pop - `pophat') * oneiffirst
>>
>> The 'oneiffirst' is a binary indicator for one residual per region, my
>> trick. By using that, I ensure that only one region-level residual is
>> considered per region. Instead, I would have liked to use an 'if'
>> statement (such as 'if oneiffirst'), so that Stata would know that
>> there are only 2500 (region-level) observations. But Stata doesn't
>> allow it. Is there another way to essentially restrict the sample
>> inside of the function evaluator program - the sample in which the
>> moments are evaluated - after GMM is called in a hhd-level dataset?
>>
>> I am not familiar with 'optimize()'. Will that let me declare samples
>> so that I estimate a region-level regression in which moments are
>> computed from a hhd-level equation?
>> Thank you.
>> Vladimir
>>
>> On Mon, Jul 1, 2013 at 1:17 AM, Austin Nichols <austinnichols@gmail.com> wrote:
>>> Vladimír Hlásny <vhlasny@gmail.com>:
>>> My question is: why try trick -gmm- into doing an optimization it's
>>> not designed for? You are trying to make the first residual within
>>> group orthogonal to income; what if you got unlucky and the first case
>>> in each group had zero income--hard to see how you could improve the
>>> objective function, right?
>>>
>>> Instead start with Mata's optimize() which can be used to roll your
>>> own GMM and much else besides: see e.g.
>>> http://www.stata.com/meeting/snasug08/nichols_gmm.pdf
>>>
>>> On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>>>> Dear Austin:
>>>> The model is definitely identified. Matlab runs the model well,
>>>> because I can use household-level and region-level variables
>>>> simultaneously. My trick in Stata also works, except that it produces
>>>> imprecise results and occasionally fails to converge. (My current
>>>> trick is to make Stata think that the model is at the household level,
>>>> and manually setting all-but-one-per-region hhd-level residuals to
>>>> zero.)
>>>>
>>>> Incomes of the responding households are my instrument.
>>>> Essentially, because each region has a different survey-response-rate
>>>> and different distribution of incomes of responding households, GMM
>>>> estimates the relationship between households' response-probability
>>>> and their income (subject to assumptions on representativeness of
>>>> responding households).
>>>>
>>>> In sum:
>>>> I need Stata to use region-level and household-level variables (or
>>>> matrices) simultaneously. Specifically, Stata must minimize
>>>> region-level residuals computed from a household-level logistic
>>>> equation. E.g., if I feed household-level data into the GMM
>>>> function-evaluator program, can I instruct the GMM to use only one
>>>> residual per region?
>>>>
>>>> Vladimir
>>>>
>>>> On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols
>>>> <austinnichols@gmail.com> wrote:
>>>>> Vladimír Hlásny <vhlasny@gmail.com>:
>>>>> I have not read the ref.  But you do not really have instruments. That
>>>>> is, you are not setting E(Ze) to zero with e a residual from some
>>>>> equation and Z your instrument; you do not have moments of that type.
>>>>> Seems you should start with optimize() instead of -gmm-, as you are
>>>>> just minimizing the sum of squared deviations from targets at the
>>>>> region level. Or am I still misunderstanding this exercise?
>>>>>
>>>>> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>>>>>> Thanks for responding, Austin.
>>>>>>
>>>>>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An
>>>>>> econometric method of correcting for unit nonresponse bias in surveys,
>>>>>> J. of Econometrics 136.
>>>>>>
>>>>>> My sample includes 12000 responding households. I know their income,
>>>>>> and which of 2500 regions they come from. In addition, for each
>>>>>> region, I know the number of non-responding households. I find the
>>>>>> coefficient on income by fitting estimated regional population to
>>>>>> actual population:
>>>>>>
>>>>>> P_i = logit f(income_i,theta)
>>>>>> actual_j = responding_j + nonresponding_j
>>>>>> theta = argmin {sum(1/P_i) - actual_j}
>>>>>>
>>>>>> Response probability may not be monotonic in income. The logit may be
>>>>>> a non-monotonic function of income.
>>>>>>
>>>>>> Thanks for any thoughts on how to estimate this in Stata, or how to
>>>>>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to
>>>>>> zero) work better.
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote:
>>>>>>> Vladimír Hlásny <vhlasny@gmail.com>:
>>>>>>> As the FAQ hints, if you don't provide full references, don't expect
>>>>>>> good answers.
>>>>>>>
>>>>>>> I don't understand your description--how are you running a logit of
>>>>>>> response on income, when you only have income for responders?  Can you
>>>>>>> give a sense of what the data looks like?
>>>>>>>
>>>>>>> On another topic, why would anyone expect response probability to be
>>>>>>> monotonic in income?
>>>>>>>
>>>>>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>>>>>>>> Hi,
>>>>>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to
>>>>>>>> correct for unit-nonresponse bias. That involves estimating
>>>>>>>> response-probability for each household,  inferring regional
>>>>>>>> population from these probabilities, and fitting against actual
>>>>>>>> regional populations. I must use household-level data and region-level
>>>>>>>> data simultaneously, because coefficients in the household-level model
>>>>>>>> are adjusted based on fit of the regional-level populations.
>>>>>>>>
>>>>>>>> I used a trick - manually resetting residuals of all but
>>>>>>>> one-per-region household - but this trick doesn't produce perfect
>>>>>>>> results. Please find the details, remaining problems, as well as the
>>>>>>>> Stata code described below. Any thoughts on this?
>>>>>>>>
>>>>>>>> Thank you for any suggestions!
>>>>>>>>
>>>>>>>> Vladimir Hlasny
>>>>>>>> Ewha Womans University
>>>>>>>> Seoul, Korea
>>>>>>>>
>>>>>>>> Details:
>>>>>>>> I am estimating households' probability to respond to a survey as a
>>>>>>>> function of their income. For each responding household (12000), I
>>>>>>>> have data on income. Also, at the level of region (3000), I know the
>>>>>>>> number of responding and non-responding households.
>>>>>>>>
>>>>>>>> I declare a logit equation of response-probability as a function of
>>>>>>>> income, to estimate it for all responding households.
>>>>>>>>
>>>>>>>> The identification is provided by fitting of population in each
>>>>>>>> region. For each responding household, I estimate their true mass as
>>>>>>>> the inverse of their response probability. Then I sum the
>>>>>>>> response-probabilities for all households in a region, and fit it
>>>>>>>> against the true population.
>>>>>>>>
>>>>>>>> Stata problem:
>>>>>>>> I am estimating GMM at the regional level. But, to obtain the
>>>>>>>> population estimate in each region, I calculate response-probabilities
>>>>>>>> at the household level and sum them up in a region. This region-level
>>>>>>>> fitting and response-probability estimation occurs
>>>>>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to
>>>>>>>> minimize region-level residuals, households response-probabilities
>>>>>>>> change.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index