Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Vladimír Hlásny <vhlasny@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: GMM minimization of regional errors imputed from hhd level model |

Date |
Mon, 1 Jul 2013 23:51:24 +0900 |

I am not sure whether my message was sent in full. Maybe the first line was cut, looking at Statalist Archive. I've put my dataset data.dta on my page home.ewha.ac.kr/club/vhlasny/public_html/clubboard/2899/data.dta As I described before, it's hhd-level data that also include region-level number of respondents and total population. On Mon, Jul 1, 2013 at 1:22 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: > (Also accessible on my page http://home.ewha.ac.kr/~vhlasny/ . > Unfortunately I don't know of a similar public dataset.) > > use data, clear > sort region hhcode > by region: gen oneiffirst=_n > by region: egen surveyedhh_psu=max(oneiffirst) > gen sampleweight = response/surveyedhh_psu > replace oneiffirst=0 if oneiffirst~=1 > gen double Winverse_sqrt = sqrt(weight)/sqrt(population) > > program gmm_nonresp > version 12 > syntax varlist if, at(name) mylhs(varlist) myrhs(varlist) myidvar(varlist) > quietly { > tempvar explinspec pophat > gen double `explinspec' = `at'[1,1] `if' > local j=2 > foreach var of varlist `myrhs' { > replace `explinspec' = `explinspec' + `var'*`at'[1,`j'] `if' > local j = `j' + 1 > } > replace `explinspec' = exp(`explinspec') > egen double `pophat' = sum(sampleweight*(1+`explinspec')/`explinspec') > `if', by(`myidvar') > replace `varlist' = (`mylhs' - `pophat')*Winverse_sqrt*oneiffirst `if' > } > end > gmm gmm_nonresp, mylhs(population) myrhs(logincome) myidvar(region) > nequations(1) parameters(theta1 theta2) instruments(logincome) > from(theta1 10 theta2 -1) > > -- > (It's possible that I should declare my instruments differently in the > GMM command. But that itself will not solve my bigger problem.) > Vladimir > > On Mon, Jul 1, 2013 at 12:37 PM, Austin Nichols <austinnichols@gmail.com> wrote: >> Vladimír Hlásny <vhlasny@gmail.com>: >> I can't see that in your code: >> , myrhs(x1) instruments(x1) >> and myrhs gets multiplied by theta2, so it must be at the individual level. >> Perhaps you should follow the usual advice, and illustrate your >> problem using a publicly available dataset. >> >> On Sun, Jun 30, 2013 at 11:18 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>> Dear Austin: >>> Thanks for the link to optimize(). I will check whether that could >>> solve my 'region-level minimization' vs. 'household-level model' >>> problem. >>> Regarding your point: >>> What you call 'x1' is a function of all incomes in a region, not >>> income of a single household. >>> Vladimir >>> >>> >>> On Mon, Jul 1, 2013 at 11:10 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>> Vladimír Hlásny <vhlasny@gmail.com>, >>>> >>>> If you're not familiar with optimize(), start with the help file. Or >>>> just follow the link I sent. >>>> >>>> You don't seem to take my point about your trick; if you put all the >>>> weight of optimization on one residual per group, and -gmm- is trying >>>> to make that one residual orthogonal to an instrument x1=income, but >>>> you (unluckily) have x1=0 in each of those cases, then how could -gmm- >>>> possibly improve on residual times zero, equals zero? An unlucky case, >>>> but possible, given your syntax, I think. >>>> >>>> On Sun, Jun 30, 2013 at 10:02 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>> Dear Austin: >>>>> I am computing the "one-per-region residuals" as the difference >>>>> between regional actual population and predicted population (sum of >>>>> household-inverse-probabilities). So my trick doesn't depend on luck - >>>>> the residuals contain information on all households within a region. >>>>> >>>>> In the code that I pasted in my original email, notice the summation >>>>> across households: >>>>> egen double `pophat' = sum( (1+exp(b0+income*b1)) / exp(b0+income*b1)) >>>>> `if', by(`region') >>>>> replace residual = (pop - `pophat') * oneiffirst >>>>> >>>>> The 'oneiffirst' is a binary indicator for one residual per region, my >>>>> trick. By using that, I ensure that only one region-level residual is >>>>> considered per region. Instead, I would have liked to use an 'if' >>>>> statement (such as 'if oneiffirst'), so that Stata would know that >>>>> there are only 2500 (region-level) observations. But Stata doesn't >>>>> allow it. Is there another way to essentially restrict the sample >>>>> inside of the function evaluator program - the sample in which the >>>>> moments are evaluated - after GMM is called in a hhd-level dataset? >>>>> >>>>> I am not familiar with 'optimize()'. Will that let me declare samples >>>>> so that I estimate a region-level regression in which moments are >>>>> computed from a hhd-level equation? >>>>> Thank you. >>>>> Vladimir >>>>> >>>>> On Mon, Jul 1, 2013 at 1:17 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>>> My question is: why try trick -gmm- into doing an optimization it's >>>>>> not designed for? You are trying to make the first residual within >>>>>> group orthogonal to income; what if you got unlucky and the first case >>>>>> in each group had zero income--hard to see how you could improve the >>>>>> objective function, right? >>>>>> >>>>>> Instead start with Mata's optimize() which can be used to roll your >>>>>> own GMM and much else besides: see e.g. >>>>>> http://www.stata.com/meeting/snasug08/nichols_gmm.pdf >>>>>> >>>>>> On Sat, Jun 29, 2013 at 10:10 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>>> Dear Austin: >>>>>>> The model is definitely identified. Matlab runs the model well, >>>>>>> because I can use household-level and region-level variables >>>>>>> simultaneously. My trick in Stata also works, except that it produces >>>>>>> imprecise results and occasionally fails to converge. (My current >>>>>>> trick is to make Stata think that the model is at the household level, >>>>>>> and manually setting all-but-one-per-region hhd-level residuals to >>>>>>> zero.) >>>>>>> >>>>>>> Incomes of the responding households are my instrument. >>>>>>> Essentially, because each region has a different survey-response-rate >>>>>>> and different distribution of incomes of responding households, GMM >>>>>>> estimates the relationship between households' response-probability >>>>>>> and their income (subject to assumptions on representativeness of >>>>>>> responding households). >>>>>>> >>>>>>> In sum: >>>>>>> I need Stata to use region-level and household-level variables (or >>>>>>> matrices) simultaneously. Specifically, Stata must minimize >>>>>>> region-level residuals computed from a household-level logistic >>>>>>> equation. E.g., if I feed household-level data into the GMM >>>>>>> function-evaluator program, can I instruct the GMM to use only one >>>>>>> residual per region? >>>>>>> >>>>>>> Vladimir >>>>>>> >>>>>>> On Sat, Jun 29, 2013 at 10:27 PM, Austin Nichols >>>>>>> <austinnichols@gmail.com> wrote: >>>>>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>>>>> I have not read the ref. But you do not really have instruments. That >>>>>>>> is, you are not setting E(Ze) to zero with e a residual from some >>>>>>>> equation and Z your instrument; you do not have moments of that type. >>>>>>>> Seems you should start with optimize() instead of -gmm-, as you are >>>>>>>> just minimizing the sum of squared deviations from targets at the >>>>>>>> region level. Or am I still misunderstanding this exercise? >>>>>>>> >>>>>>>> On Fri, Jun 28, 2013 at 10:08 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>>>>> Thanks for responding, Austin. >>>>>>>>> >>>>>>>>> The full reference is: Korinek, Mistiaen and Ravallion (2007), An >>>>>>>>> econometric method of correcting for unit nonresponse bias in surveys, >>>>>>>>> J. of Econometrics 136. >>>>>>>>> >>>>>>>>> My sample includes 12000 responding households. I know their income, >>>>>>>>> and which of 2500 regions they come from. In addition, for each >>>>>>>>> region, I know the number of non-responding households. I find the >>>>>>>>> coefficient on income by fitting estimated regional population to >>>>>>>>> actual population: >>>>>>>>> >>>>>>>>> P_i = logit f(income_i,theta) >>>>>>>>> actual_j = responding_j + nonresponding_j >>>>>>>>> theta = argmin {sum(1/P_i) - actual_j} >>>>>>>>> >>>>>>>>> Response probability may not be monotonic in income. The logit may be >>>>>>>>> a non-monotonic function of income. >>>>>>>>> >>>>>>>>> Thanks for any thoughts on how to estimate this in Stata, or how to >>>>>>>>> make my 'trick' (setting 12000-2500 hhd-level residuals manually to >>>>>>>>> zero) work better. >>>>>>>>> >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On Sat, Jun 29, 2013 at 1:49 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>>>>>>>> Vladimír Hlásny <vhlasny@gmail.com>: >>>>>>>>>> As the FAQ hints, if you don't provide full references, don't expect >>>>>>>>>> good answers. >>>>>>>>>> >>>>>>>>>> I don't understand your description--how are you running a logit of >>>>>>>>>> response on income, when you only have income for responders? Can you >>>>>>>>>> give a sense of what the data looks like? >>>>>>>>>> >>>>>>>>>> On another topic, why would anyone expect response probability to be >>>>>>>>>> monotonic in income? >>>>>>>>>> >>>>>>>>>> On Fri, Jun 28, 2013 at 10:05 AM, Vladimír Hlásny <vhlasny@gmail.com> wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> I am using a method by Korinek, Mistiaen and Ravallion (2007) to >>>>>>>>>>> correct for unit-nonresponse bias. That involves estimating >>>>>>>>>>> response-probability for each household, inferring regional >>>>>>>>>>> population from these probabilities, and fitting against actual >>>>>>>>>>> regional populations. I must use household-level data and region-level >>>>>>>>>>> data simultaneously, because coefficients in the household-level model >>>>>>>>>>> are adjusted based on fit of the regional-level populations. >>>>>>>>>>> >>>>>>>>>>> I used a trick - manually resetting residuals of all but >>>>>>>>>>> one-per-region household - but this trick doesn't produce perfect >>>>>>>>>>> results. Please find the details, remaining problems, as well as the >>>>>>>>>>> Stata code described below. Any thoughts on this? >>>>>>>>>>> >>>>>>>>>>> Thank you for any suggestions! >>>>>>>>>>> >>>>>>>>>>> Vladimir Hlasny >>>>>>>>>>> Ewha Womans University >>>>>>>>>>> Seoul, Korea >>>>>>>>>>> >>>>>>>>>>> Details: >>>>>>>>>>> I am estimating households' probability to respond to a survey as a >>>>>>>>>>> function of their income. For each responding household (12000), I >>>>>>>>>>> have data on income. Also, at the level of region (3000), I know the >>>>>>>>>>> number of responding and non-responding households. >>>>>>>>>>> >>>>>>>>>>> I declare a logit equation of response-probability as a function of >>>>>>>>>>> income, to estimate it for all responding households. >>>>>>>>>>> >>>>>>>>>>> The identification is provided by fitting of population in each >>>>>>>>>>> region. For each responding household, I estimate their true mass as >>>>>>>>>>> the inverse of their response probability. Then I sum the >>>>>>>>>>> response-probabilities for all households in a region, and fit it >>>>>>>>>>> against the true population. >>>>>>>>>>> >>>>>>>>>>> Stata problem: >>>>>>>>>>> I am estimating GMM at the regional level. But, to obtain the >>>>>>>>>>> population estimate in each region, I calculate response-probabilities >>>>>>>>>>> at the household level and sum them up in a region. This region-level >>>>>>>>>>> fitting and response-probability estimation occurs >>>>>>>>>>> simultaneously/iteratively -- as logit-coefficients are adjusted to >>>>>>>>>>> minimize region-level residuals, households response-probabilities >>>>>>>>>>> change. >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: RE: RE: RE: Discrete choice in MATA** - Next by Date:
**RE: st: RE: format each label on axis individually** - Previous by thread:
**Re: st: GMM minimization of regional errors imputed from hhd level model** - Next by thread:
**RE: st: RE: RE: RE: Discrete choice in MATA** - Index(es):