Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: GMM minimization of regional errors imputed from hhd level model

From	Vladimír Hlásny <[email protected]>
To	[email protected]
Subject	st: Re: GMM minimization of regional errors imputed from hhd level model
Date	Wed, 11 Dec 2013 13:04:38 +0900

Reposting the code without errors due to comments spilling into the
following lines. Sorry about that. Vladimir

clear all
set obs 500
gen x = 5+uniform() /*create individuals' incomes*/
gen psucode=group(20) /*create 10 regions with N/20 sampled individuals each*/
bysort psucode: egen avgx=mean(x)
gen weight=0.7*(avgx-4) /*create region-level sampling weights*/
gen population= round(3*_N/20+_N/20*(avgx-4)) /*create region-level
true population, 4+ times larger than region sample (correlated with
income)*/
gen sub = 0.25 /*create overall sampling rate (e.g., sample has 25% of
true population)*/
drop avgx

program drop _all
prog nonresp, eclass
version 10
syntax varlist(numeric) [if] [in] [, samplewt(varlist)
subsamplewt(varlist) myidvar(varlist)]
local lhs: word 1 of `varlist'
replace `lhs' = `lhs'*`subsamplewt' /*multiply regional population by
the percentage that we have in the sample*/
local rhs: list varlist - lhs
tempvar region
egen `region' = group(`myidvar') /*create region indicators*/
matrix b = J(1,`:word count `rhs' _cons',2)
matrix b[1,1]=-0.2
matname b `rhs' _cons, c(.)
mata: m_nonresp("`lhs'", "`rhs'", "`samplewt'", "`region'")

ereturn post b, e(`touse') depname(`lhs')
ereturn display
end

mata:
void logistic(todo,b,crit,g,H)
{
external y,X,W,W2,addobs
p = invlogit(X*b') /*response probability = logistic of x [n*1]*/
m = y - (addobs'*(1:/p)) /*region error = region population - (sum of
individual inverse probs in region) [regions*1]*/
crit = m'*(W:*m) /*weight errors by size of region, sum up squared
errors, and minimize*/
}
void m_nonresp(string scalar lhs, string scalar rhs, string scalar
samplewt, string scalar region)
{
external y,X,W,W2,addobs
st_view(y1=., ., lhs)
st_view(regn=., ., region)
st_view(W1=., ., samplewt)
st_view(X=., ., rhs)
X = X,J(rows(X),1,1)

addobs = J(rows(X),max(regn),0) /*create a sparse matrix [n*regions]
assigning individuals to regions*/
for (i=1; i<=rows(X); i++) {
addobs[i,regn[i]] = 1
}
y2 = y1:*addobs
y = colmax(y2)' /*create population by region [regions*1]*/
W2 = colmax(W1:*addobs)'
W = W2:/y /*region error weight = sampling weight / region population
[regions*1]*/

S = optimize_init()
optimize_init_evaluator(S, &logistic())
optimize_init_which(S,"min")
optimize_init_evaluatortype(S,"d0")
optimize_init_params(S,st_matrix("b"))
p = optimize(S)
st_replacematrix("b",p)
}
end
nonresp population x, samplewt(weight) subsamplewt(sub) myidvar(psucode)
bootstrap, reps(3): nonresp population x, samplewt(weight)
subsamplewt(sub) myidvar(psucode)

On Wed, Dec 11, 2013 at 12:42 PM, Vladimír Hlásny <[email protected]> wrote:
> Hi, I have now tried Austin's suggestion - optimize in Mata. The great
> news is that I can do the optimization at the regional level (using
> matrices), while using individual-level variables to estimate
> coefficients - exactly what I needed. The remaining problems are that
> 1) the model is sensitive and sometimes doesn't want to converge (even
> using a subset of data that have worked), 2) solving is sometimes slow
> for 1000+ observations, and 3) bootstrapping of errors also appears
> sensitive and extremely slow even with 200 observations.
>
> Is there a faster way to calculate my standard errors? Being new to
> Mata, I wonder if I could define variables or do some calculations
> more efficiently.
>
> In the code below, I create a fake dataset that looks like Current
> Population Survey data. Sample size can be varied from say 100 to
> 10000. Then essentially 1) the logistic function calculates the
> probability of survey-response by each individual as a function of
> coefficients, 2) these probabilities, inverted, are added up for each
> region to derive the predicted population in each region, and 3) these
> predicted regional populations are fitted against observed regional
> populations. Refer to equations 2,6,7 in Korinek, Mistiaen and
> Ravallion (2007), An econometric method of correcting for unit
> nonresponse bias in surveys, J. of Econometrics 136.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: _rmcoll issues
Next by Date: RE: st: _rmcoll issues
Previous by thread: st: Re: GMM minimization of regional errors imputed from hhd level model
Index(es):
- Date
- Thread