Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Vladimír Hlásny <vhlasny@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: GMM minimization of regional errors imputed from hhd level model |

Date |
Mon, 1 Jul 2013 13:22:02 +0900 |

(Also accessible on my page http://home.ewha.ac.kr/~vhlasny/ . Unfortunately I don't know of a similar public dataset.) use data, clear sort region hhcode by region: gen oneiffirst=_n by region: egen surveyedhh_psu=max(oneiffirst) gen sampleweight = response/surveyedhh_psu replace oneiffirst=0 if oneiffirst~=1 gen double Winverse_sqrt = sqrt(weight)/sqrt(population) program gmm_nonresp version 12 syntax varlist if, at(name) mylhs(varlist) myrhs(varlist) myidvar(varlist) quietly { tempvar explinspec pophat gen double `explinspec' = `at'[1,1] `if' local j=2 foreach var of varlist `myrhs' { replace `explinspec' = `explinspec' + `var'*`at'[1,`j'] `if' local j = `j' + 1 } replace `explinspec' = exp(`explinspec') egen double `pophat' = sum(sampleweight*(1+`explinspec')/`explinspec') `if', by(`myidvar') replace `varlist' = (`mylhs' - `pophat')*Winverse_sqrt*oneiffirst `if' } end gmm gmm_nonresp, mylhs(population) myrhs(logincome) myidvar(region) nequations(1) parameters(theta1 theta2) instruments(logincome) from(theta1 10 theta2 -1) -- (It's possible that I should declare my instruments differently in the GMM command. But that itself will not solve my bigger problem.) Vladimir On Mon, Jul 1, 2013 at 12:37 PM, Austin Nichols <austinnichols@gmail.com> wrote: > Vladimír Hlásny <vhlasny@gmail.com>: > I can't see that in your code: > , myrhs(x1) instruments(x1) > and myrhs gets multiplied by theta2, so it must be at the individual level. > Perhaps you should follow the usual advice, and illustrate your > problem using a publicly available dataset. > > On Sun, Jun 30, 2013 at 11:18 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >> Dear Austin: >> Thanks for the link to optimize(). I will check whether that could >> solve my 'region-level minimization' vs. 'household-level model' >> problem. >> Regarding your point: >> What you call 'x1' is a function of all incomes in a region, not >> income of a single household. >> Vladimir >> >> >> On Mon, Jul 1, 2013 at 11:10 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>> Vladimír Hlásny <vhlasny@gmail.com>, >>> >>> If you're not familiar with optimize(), start with the help file. Or >>> just follow the link I sent. >>> >>> You don't seem to take my point about your trick; if you put all the >>> weight of optimization on one residual per group, and -gmm- is trying >>> to make that one residual orthogonal to an instrument x1=income, but >>> you (unluckily) have x1=0 in each of those cases, then how could -gmm- >>> possibly improve on residual times zero, equals zero? An unlucky case, >>> but possible, given your syntax, I think. >>> >>> On Sun, Jun 30, 2013 at 10:02 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>> Dear Austin: >>>> I am computing the "one-per-region residuals" as the difference >>>> between regional actual population and predicted population (sum of >>>> household-inverse-probabilities). So my trick doesn't depend on luck - >>>> the residuals contain information on all households within a region. >>>> >>>> In the code that I pasted in my original email, notice the summation >>>> across households: >>>> egen double `pophat' = sum( (1+exp(b0+income*b1)) / exp(b0+income*b1)) >>>> `if', by(`region') >>>> replace residual = (pop - `pophat') * oneiffirst >>>> >>>> The 'oneiffirst' is a binary indicator for one residual per region, my >>>> trick. By using that, I ensure that only one region-level residual is >>>> considered per region. Instead, I would have liked to use an 'if' >>>> statement (such as 'if oneiffirst'), so that Stata would know that >>>> there are only 2500 (region-level) observations. But Stata doesn't >>>> allow it. Is there another way to essentially restrict the sample >>>> inside of the function evaluator program - the sample in which the >>>> moments are evaluated - after GMM is called in a hhd-level dataset? >>>> >>>> I am not familiar with 'optimize()'. Will that let me declare samples >>>> so that I estimate a region-level regression in which moments are >>>> computed from a hhd-level equation? >>>> Thank you. >>>> Vladimir >>>> >>>> On Mon, Jul 1, 2013 at 1:17 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>> My question is: why try trick -gmm- into doing an optimization it's >>>>> not designed for? You are trying to make the first residual within >>>>> group orthogonal to income; what if you got unlucky and the first case >>>>> in each group had zero income--hard to see how you could improve the >>>>> objective function, right? >>>>> >>>>> Instead start with Mata's optimize() which can be used to roll your >>>>> own GMM and much else besides: see e.g. >>>>> http://www.stata.com/meeting/snasug08/nichols_gmm.pdf >>>>> >>>>> On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>> Dear Austin: >>>>>> The model is definitely identified. Matlab runs the model well, >>>>>> because I can use household-level and region-level variables >>>>>> simultaneously. My trick in Stata also works, except that it produces >>>>>> imprecise results and occasionally fails to converge. (My current >>>>>> trick is to make Stata think that the model is at the household level, >>>>>> and manually setting all-but-one-per-region hhd-level residuals to >>>>>> zero.) >>>>>> >>>>>> Incomes of the responding households are my instrument. >>>>>> Essentially, because each region has a different survey-response-rate >>>>>> and different distribution of incomes of responding households, GMM >>>>>> estimates the relationship between households' response-probability >>>>>> and their income (subject to assumptions on representativeness of >>>>>> responding households). >>>>>> >>>>>> In sum: >>>>>> I need Stata to use region-level and household-level variables (or >>>>>> matrices) simultaneously. Specifically, Stata must minimize >>>>>> region-level residuals computed from a household-level logistic >>>>>> equation. E.g., if I feed household-level data into the GMM >>>>>> function-evaluator program, can I instruct the GMM to use only one >>>>>> residual per region? >>>>>> >>>>>> Vladimir >>>>>> >>>>>> On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols >>>>>> <austinnichols@gmail.com> wrote: >>>>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>>>> I have not read the ref. But you do not really have instruments. That >>>>>>> is, you are not setting E(Ze) to zero with e a residual from some >>>>>>> equation and Z your instrument; you do not have moments of that type. >>>>>>> Seems you should start with optimize() instead of -gmm-, as you are >>>>>>> just minimizing the sum of squared deviations from targets at the >>>>>>> region level. Or am I still misunderstanding this exercise? >>>>>>> >>>>>>> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>>>> Thanks for responding, Austin. >>>>>>>> >>>>>>>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An >>>>>>>> econometric method of correcting for unit nonresponse bias in surveys, >>>>>>>> J. of Econometrics 136. >>>>>>>> >>>>>>>> My sample includes 12000 responding households. I know their income, >>>>>>>> and which of 2500 regions they come from. In addition, for each >>>>>>>> region, I know the number of non-responding households. I find the >>>>>>>> coefficient on income by fitting estimated regional population to >>>>>>>> actual population: >>>>>>>> >>>>>>>> P_i = logit f(income_i,theta) >>>>>>>> actual_j = responding_j + nonresponding_j >>>>>>>> theta = argmin {sum(1/P_i) - actual_j} >>>>>>>> >>>>>>>> Response probability may not be monotonic in income. The logit may be >>>>>>>> a non-monotonic function of income. >>>>>>>> >>>>>>>> Thanks for any thoughts on how to estimate this in Stata, or how to >>>>>>>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to >>>>>>>> zero) work better. >>>>>>>> >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>>>>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>>>>>> As the FAQ hints, if you don't provide full references, don't expect >>>>>>>>> good answers. >>>>>>>>> >>>>>>>>> I don't understand your description--how are you running a logit of >>>>>>>>> response on income, when you only have income for responders? Can you >>>>>>>>> give a sense of what the data looks like? >>>>>>>>> >>>>>>>>> On another topic, why would anyone expect response probability to be >>>>>>>>> monotonic in income? >>>>>>>>> >>>>>>>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>>>>>> Hi, >>>>>>>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to >>>>>>>>>> correct for unit-nonresponse bias. That involves estimating >>>>>>>>>> response-probability for each household, inferring regional >>>>>>>>>> population from these probabilities, and fitting against actual >>>>>>>>>> regional populations. I must use household-level data and region-level >>>>>>>>>> data simultaneously, because coefficients in the household-level model >>>>>>>>>> are adjusted based on fit of the regional-level populations. >>>>>>>>>> >>>>>>>>>> I used a trick - manually resetting residuals of all but >>>>>>>>>> one-per-region household - but this trick doesn't produce perfect >>>>>>>>>> results. Please find the details, remaining problems, as well as the >>>>>>>>>> Stata code described below. Any thoughts on this? >>>>>>>>>> >>>>>>>>>> Thank you for any suggestions! >>>>>>>>>> >>>>>>>>>> Vladimir Hlasny >>>>>>>>>> Ewha Womans University >>>>>>>>>> Seoul, Korea >>>>>>>>>> >>>>>>>>>> Details: >>>>>>>>>> I am estimating households' probability to respond to a survey as a >>>>>>>>>> function of their income. For each responding household (12000), I >>>>>>>>>> have data on income. Also, at the level of region (3000), I know the >>>>>>>>>> number of responding and non-responding households. >>>>>>>>>> >>>>>>>>>> I declare a logit equation of response-probability as a function of >>>>>>>>>> income, to estimate it for all responding households. >>>>>>>>>> >>>>>>>>>> The identification is provided by fitting of population in each >>>>>>>>>> region. For each responding household, I estimate their true mass as >>>>>>>>>> the inverse of their response probability. Then I sum the >>>>>>>>>> response-probabilities for all households in a region, and fit it >>>>>>>>>> against the true population. >>>>>>>>>> >>>>>>>>>> Stata problem: >>>>>>>>>> I am estimating GMM at the regional level. But, to obtain the >>>>>>>>>> population estimate in each region, I calculate response-probabilities >>>>>>>>>> at the household level and sum them up in a region. This region-level >>>>>>>>>> fitting and response-probability estimation occurs >>>>>>>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to >>>>>>>>>> minimize region-level residuals, households response-probabilities >>>>>>>>>> change. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Vladimír Hlásny <vhlasny@gmail.com>

**Re: st: GMM minimization of regional errors imputed from hhd level model***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Next by Date:
**st: Window behavior Stata 13, Mac OSX** - Previous by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Next by thread:
**st: how to assign an indicator variable to all observation within a group if a condition is satisfied at least twice** - Index(es):