Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: GMM minimization of regional errors imputed from hhd level model


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: GMM minimization of regional errors imputed from hhd level model
Date   Sun, 30 Jun 2013 12:17:14 -0400

Vladimír Hlásny <vhlasny@gmail.com>:
My question is: why try trick -gmm- into doing an optimization it's
not designed for? You are trying to make the first residual within
group orthogonal to income; what if you got unlucky and the first case
in each group had zero income--hard to see how you could improve the
objective function, right?

Instead start with Mata's optimize() which can be used to roll your
own GMM and much else besides: see e.g.
http://www.stata.com/meeting/snasug08/nichols_gmm.pdf

On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
> Dear Austin:
> The model is definitely identified. Matlab runs the model well,
> because I can use household-level and region-level variables
> simultaneously. My trick in Stata also works, except that it produces
> imprecise results and occasionally fails to converge. (My current
> trick is to make Stata think that the model is at the household level,
> and manually setting all-but-one-per-region hhd-level residuals to
> zero.)
>
> Incomes of the responding households are my instrument.
> Essentially, because each region has a different survey-response-rate
> and different distribution of incomes of responding households, GMM
> estimates the relationship between households' response-probability
> and their income (subject to assumptions on representativeness of
> responding households).
>
> In sum:
> I need Stata to use region-level and household-level variables (or
> matrices) simultaneously. Specifically, Stata must minimize
> region-level residuals computed from a household-level logistic
> equation. E.g., if I feed household-level data into the GMM
> function-evaluator program, can I instruct the GMM to use only one
> residual per region?
>
> Vladimir
>
> On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols
> <austinnichols@gmail.com> wrote:
>> Vladimír Hlásny <vhlasny@gmail.com>:
>> I have not read the ref.  But you do not really have instruments. That
>> is, you are not setting E(Ze) to zero with e a residual from some
>> equation and Z your instrument; you do not have moments of that type.
>> Seems you should start with optimize() instead of -gmm-, as you are
>> just minimizing the sum of squared deviations from targets at the
>> region level. Or am I still misunderstanding this exercise?
>>
>> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>>> Thanks for responding, Austin.
>>>
>>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An
>>> econometric method of correcting for unit nonresponse bias in surveys,
>>> J. of Econometrics 136.
>>>
>>> My sample includes 12000 responding households. I know their income,
>>> and which of 2500 regions they come from. In addition, for each
>>> region, I know the number of non-responding households. I find the
>>> coefficient on income by fitting estimated regional population to
>>> actual population:
>>>
>>> P_i = logit f(income_i,theta)
>>> actual_j = responding_j + nonresponding_j
>>> theta = argmin {sum(1/P_i) - actual_j}
>>>
>>> Response probability may not be monotonic in income. The logit may be
>>> a non-monotonic function of income.
>>>
>>> Thanks for any thoughts on how to estimate this in Stata, or how to
>>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to
>>> zero) work better.
>>>
>>> Vladimir
>>>
>>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote:
>>>> Vladimír Hlásny <vhlasny@gmail.com>:
>>>> As the FAQ hints, if you don't provide full references, don't expect
>>>> good answers.
>>>>
>>>> I don't understand your description--how are you running a logit of
>>>> response on income, when you only have income for responders?  Can you
>>>> give a sense of what the data looks like?
>>>>
>>>> On another topic, why would anyone expect response probability to be
>>>> monotonic in income?
>>>>
>>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote:
>>>>> Hi,
>>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to
>>>>> correct for unit-nonresponse bias. That involves estimating
>>>>> response-probability for each household,  inferring regional
>>>>> population from these probabilities, and fitting against actual
>>>>> regional populations. I must use household-level data and region-level
>>>>> data simultaneously, because coefficients in the household-level model
>>>>> are adjusted based on fit of the regional-level populations.
>>>>>
>>>>> I used a trick - manually resetting residuals of all but
>>>>> one-per-region household - but this trick doesn't produce perfect
>>>>> results. Please find the details, remaining problems, as well as the
>>>>> Stata code described below. Any thoughts on this?
>>>>>
>>>>> Thank you for any suggestions!
>>>>>
>>>>> Vladimir Hlasny
>>>>> Ewha Womans University
>>>>> Seoul, Korea
>>>>>
>>>>> Details:
>>>>> I am estimating households' probability to respond to a survey as a
>>>>> function of their income. For each responding household (12000), I
>>>>> have data on income. Also, at the level of region (3000), I know the
>>>>> number of responding and non-responding households.
>>>>>
>>>>> I declare a logit equation of response-probability as a function of
>>>>> income, to estimate it for all responding households.
>>>>>
>>>>> The identification is provided by fitting of population in each
>>>>> region. For each responding household, I estimate their true mass as
>>>>> the inverse of their response probability. Then I sum the
>>>>> response-probabilities for all households in a region, and fit it
>>>>> against the true population.
>>>>>
>>>>> Stata problem:
>>>>> I am estimating GMM at the regional level. But, to obtain the
>>>>> population estimate in each region, I calculate response-probabilities
>>>>> at the household level and sum them up in a region. This region-level
>>>>> fitting and response-probability estimation occurs
>>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to
>>>>> minimize region-level residuals, households response-probabilities
>>>>> change.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index