Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Vladimír Hlásny <vhlasny@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Re: GMM minimization of regional errors imputed from hhd level model |

Date |
Wed, 11 Dec 2013 12:42:14 +0900 |

Hi, I have now tried Austin's suggestion - optimize in Mata. The great news is that I can do the optimization at the regional level (using matrices), while using individual-level variables to estimate coefficients - exactly what I needed. The remaining problems are that 1) the model is sensitive and sometimes doesn't want to converge (even using a subset of data that have worked), 2) solving is sometimes slow for 1000+ observations, and 3) bootstrapping of errors also appears sensitive and extremely slow even with 200 observations. Is there a faster way to calculate my standard errors? Being new to Mata, I wonder if I could define variables or do some calculations more efficiently. In the code below, I create a fake dataset that looks like Current Population Survey data. Sample size can be varied from say 100 to 10000. Then essentially 1) the logistic function calculates the probability of survey-response by each individual as a function of coefficients, 2) these probabilities, inverted, are added up for each region to derive the predicted population in each region, and 3) these predicted regional populations are fitted against observed regional populations. Refer to equations 2,6,7 in Korinek, Mistiaen and Ravallion (2007), An econometric method of correcting for unit nonresponse bias in surveys, J. of Econometrics 136. clear all set obs 500 gen x = 5+uniform() // create individuals' incomes gen psucode=group(20) // create 10 regions with N/20 sampled individuals each bysort psucode: egen avgx=mean(x) gen weight=0.7*(avgx-4) // create region-level sampling weights gen population= round(3*_N/20+_N/20*(avgx-4)) // create region-level true population, 4+ times larger than region sample (correlated with income) gen sub = 0.25 // create overall sampling rate (e.g., sample has 25% of true population) drop avgx program drop _all prog nonresp, eclass version 10 syntax varlist(numeric) [if] [in] [, samplewt(varlist) subsamplewt(varlist) myidvar(varlist)] local lhs: word 1 of `varlist' replace `lhs' = `lhs'*`subsamplewt' // multiply regional population by the percentage that we have in the sample local rhs: list varlist - lhs tempvar region egen `region' = group(`myidvar') // create region indicators matrix b = J(1,`:word count `rhs' _cons',2) matrix b[1,1]=-0.2 matname b `rhs' _cons, c(.) mata: m_nonresp("`lhs'", "`rhs'", "`samplewt'", "`region'") ereturn post b, e(`touse') depname(`lhs') ereturn display end mata: void logistic(todo,b,crit,g,H) { external y,X,W,W2,addobs p = invlogit(X*b') // response probability = logistic of x [n*1] m = y - (addobs'*(1:/p)) // region error = region population - (sum of individual inverse probs in region) [regions*1] crit = m'*(W:*m) // weight errors by size of region, sum up squared errors, and minimize } void m_nonresp(string scalar lhs, string scalar rhs, string scalar samplewt, string scalar region) { external y,X,W,W2,addobs st_view(y1=., ., lhs) st_view(regn=., ., region) st_view(W1=., ., samplewt) st_view(X=., ., rhs) X = X,J(rows(X),1,1) addobs = J(rows(X),max(regn),0) // create a sparse matrix [n*regions] assigning individuals to regions for (i=1; i<=rows(X); i++) { addobs[i,regn[i]] = 1 } y2 = y1:*addobs y = colmax(y2)' // create population by region [regions*1] W2 = colmax(W1:*addobs)' W = W2:/y // region error weight = sampling weight / region population [regions*1] S = optimize_init() optimize_init_evaluator(S, &logistic()) optimize_init_which(S,"min") optimize_init_evaluatortype(S,"d0") optimize_init_params(S,st_matrix("b")) p = optimize(S) st_replacematrix("b",p) } end nonresp population x, samplewt(weight) subsamplewt(sub) myidvar(psucode) bootstrap, reps(3): nonresp population x, samplewt(weight) subsamplewt(sub) myidvar(psucode) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: _rmcoll issues** - Next by Date:
**Re: st: _rmcoll issues** - Previous by thread:
**st: _rmcoll issues** - Next by thread:
**st: Re: GMM minimization of regional errors imputed from hhd level model** - Index(es):