Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: GMM minimization of regional errors imputed from hhd level model

From	Vladimír Hlásny <[email protected]>
To	[email protected]
Subject	st: Re: GMM minimization of regional errors imputed from hhd level model
Date	Wed, 11 Dec 2013 12:42:14 +0900

Hi, I have now tried Austin's suggestion - optimize in Mata. The great
news is that I can do the optimization at the regional level (using
matrices), while using individual-level variables to estimate
coefficients - exactly what I needed. The remaining problems are that
1) the model is sensitive and sometimes doesn't want to converge (even
using a subset of data that have worked), 2) solving is sometimes slow
for 1000+ observations, and 3) bootstrapping of errors also appears
sensitive and extremely slow even with 200 observations.

Is there a faster way to calculate my standard errors? Being new to
Mata, I wonder if I could define variables or do some calculations
more efficiently.

In the code below, I create a fake dataset that looks like Current
Population Survey data. Sample size can be varied from say 100 to
10000. Then essentially 1) the logistic function calculates the
probability of survey-response by each individual as a function of
coefficients, 2) these probabilities, inverted, are added up for each
region to derive the predicted population in each region, and 3) these
predicted regional populations are fitted against observed regional
populations. Refer to equations 2,6,7 in Korinek, Mistiaen and
Ravallion (2007), An econometric method of correcting for unit
nonresponse bias in surveys, J. of Econometrics 136.

clear all
set obs 500
gen x = 5+uniform() // create individuals' incomes
gen psucode=group(20) // create 10 regions with N/20 sampled individuals each
bysort psucode: egen avgx=mean(x)
gen weight=0.7*(avgx-4) // create region-level sampling weights
gen population= round(3*_N/20+_N/20*(avgx-4)) // create region-level
true population, 4+ times larger than region sample (correlated with
income)
gen sub = 0.25 // create overall sampling rate (e.g., sample has 25%
of true population)
drop avgx

program drop _all
prog nonresp, eclass
version 10
syntax varlist(numeric) [if] [in] [, samplewt(varlist)
subsamplewt(varlist) myidvar(varlist)]
local lhs: word 1 of `varlist'
replace `lhs' = `lhs'*`subsamplewt' // multiply regional population by
the percentage that we have in the sample
local rhs: list varlist - lhs
tempvar region
egen `region' = group(`myidvar') // create region indicators
matrix b = J(1,`:word count `rhs' _cons',2)
matrix b[1,1]=-0.2
matname b `rhs' _cons, c(.)
mata: m_nonresp("`lhs'", "`rhs'", "`samplewt'", "`region'")

ereturn post b, e(`touse') depname(`lhs')
ereturn display
end

mata:
void logistic(todo,b,crit,g,H)
{
external y,X,W,W2,addobs
p = invlogit(X*b') // response probability = logistic of x [n*1]
m = y - (addobs'*(1:/p)) // region error = region population - (sum of
individual inverse probs in region) [regions*1]
crit = m'*(W:*m) // weight errors by size of region, sum up squared
errors, and minimize
}
void m_nonresp(string scalar lhs, string scalar rhs, string scalar
samplewt, string scalar region)
{
external y,X,W,W2,addobs
st_view(y1=., ., lhs)
st_view(regn=., ., region)
st_view(W1=., ., samplewt)
st_view(X=., ., rhs)
X = X,J(rows(X),1,1)

addobs = J(rows(X),max(regn),0) // create a sparse matrix [n*regions]
assigning individuals to regions
for (i=1; i<=rows(X); i++) {
addobs[i,regn[i]] = 1
}
y2 = y1:*addobs
y = colmax(y2)' // create population by region [regions*1]
W2 = colmax(W1:*addobs)'
W = W2:/y // region error weight = sampling weight / region population
[regions*1]

S = optimize_init()
optimize_init_evaluator(S, &logistic())
optimize_init_which(S,"min")
optimize_init_evaluatortype(S,"d0")
optimize_init_params(S,st_matrix("b"))
p = optimize(S)
st_replacematrix("b",p)
}
end
nonresp population x, samplewt(weight) subsamplewt(sub) myidvar(psucode)
bootstrap, reps(3): nonresp population x, samplewt(weight)
subsamplewt(sub) myidvar(psucode)
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: _rmcoll issues
Next by Date: Re: st: _rmcoll issues
Previous by thread: st: _rmcoll issues
Next by thread: st: Re: GMM minimization of regional errors imputed from hhd level model
Index(es):
- Date
- Thread