Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: RE: convergence of sample mean using gsample with weights

 From tshmak To "statalist@hsphsun2.harvard.edu" Subject st: RE: convergence of sample mean using gsample with weights Date Wed, 15 May 2013 16:32:20 +0800

```Hi,

I'm not exactly sure what you're trying to do here. Perhaps if you explain what `wt', `wtn', and `wtn2' are, that might help.

Intuitively though, it looks like you're adding `meannew' - `meanold' to `res' in every loop.

Suppose X_i is a random variable such that X_i = `meannew' - `meanold' for loop i,

Then res = sum_i X_i

If X_i are independent across loops, then:
E(res) = sum_i E(X_i)
Var(res) = sum_i Var(X_i)

Since you're sampling from your original data, let's say
E(X_i) = m, which is `meannew' - `meanold' from your original data (appropriately weighted)
Suppose:
Var(X_i) = Var(X) for all i
then
Var(res) = N0 Var(X)
Var(res/N0) = Var(X)/N0
E(res/N0) = m

Therefore, it appears that res/N0 should converge to m.

Is that what's happening?

Tim

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Olga Gorbachev
Sent: 14 May 2013 01:51
To: statalist@hsphsun2.harvard.edu
Subject: st: convergence of sample mean using gsample with weights

Dear List servers,

We are trying to match the means of the subsample that is randomly
generated using gsample with weights with that of the original sample.
but are not successful, the differences in means are persistent, even
after over 5000 iterations.

The program we are running to generate a random sample and the table
of differences in means are below:

local res = 0
local N0 = 1000
di "i = " _c
forv i = 1/`N0' {
di " `i'" _c
cap: drop wtn2
qui: gen wtn2 = .
qui: levelsof year, local(years)
foreach yr of local years {
su work [aw = wt] if year == `yr', meanonly
local pct = 1 - r(mean)
qui: count if year == `yr' & work & wt > 0
local n = r(N) * `pct'
gsample `n' if year == `yr' & work [aw = wtn], gen(smpl) replace
// qui: gen smpl`yr' = smpl
qui: replace wtn2 = wtn * smpl if year == `yr'
}

su nokid if year == 2009 [aw = wtn2], meanonly
local meannew = r(mean)
su nokid if year == 2009 & !work [aw = wt], meanonly
local meanold = r(mean)
local res = `res' + `meannew' - `meanold'
}

di `res' / `N0'

After  5751 iterations, the mean differences are persistent: (white,
nokid, wife are 0/1 dummies, ed is 0/1/2/3 categorical var)

white          ed       nokid        wife        age        RMSE

1968   .07075803   .02760528  -.07051057   .10025028  -1.9695697   .64914917

1969    .0685191    .0043999   .00714388   .07798387  -1.0421818   .44748337

1970   .06611464   .05476483  -.02358097     .077666  -1.8464169   .76425403

1971   .06971375   .02524083  -.04641226   .07669907  -1.9877308   .84812203

1972   .06842085   .00459005  -.01252929   .07209953  -1.4438688   .58143546

1973   .07875147   -.0065409  -.00719551    .0762982  -.84213075   .76031927

1974   .07394796   .01265153   .03503028   .06037437   .47233679   .45948809

1975    .0754228  -.02080965   .04125415   .06711441   1.3878045   .44676919

1976   .07922582   -.0270845    .0703499   .08149621   3.0252375   1.2009947

1977   .07757246  -.06248362   .13932287   .05320747    4.381814   1.9495654

1978    .0712201  -.10770348   .09020478   .07284452   3.3190634   1.0499406

1979    .0867201  -.11178738   .11253834   .07264209    2.972378   1.5287306

1980   .07419313  -.03035967   .13589319   .06733552   4.0215276   1.7365936

1981   .07878431  -.01949136   .17420796   .04241048   5.3660359   2.8373346

1982    .0829203  -.11727873   .17645938   .05927291   5.4543346   2.3774178

1983   .07845573  -.10130641   .09725345    .0687734   2.3112865   1.4648557

1984   .09015502  -.07159415   .09821572    .0418674   4.0170757   2.2326577

1985   .07475118  -.17578234   .15213582   .06892136   4.8550365   2.4831803

1986   .09253893  -.20126191   .16269138   .06200215   4.8770742   1.7727842

1987   .08100237  -.17625041   .14548996   .05864067   3.6014147   1.6212222

1988   .10134555  -.08595601   .20253243   .08289725   6.7326522   2.8191897

1989   .08155963  -.10436591   .15625212   .02222005   3.9071876   1.0379599

1990   .09724568   .03089819    .1811577   .08476095   4.8926164   2.3862564

1991   .08948172  -.03575608    .2627551   .08514362   8.0915346   3.8331833

1992   .08865055   -.1055572   .25235049   .10462895   7.4632178   3.3645744

1993    .0951815  -.07997661   .17046405   .06604482   4.2573245   2.0752016

1994   .04873715  -.16646878   .07550069   .03458139   .98317415   .39188131

1995   .06876277  -.13850863   .13320029   .02768267   2.4135662   .66990443

1996   .00856876  -.21758791   .08564262  -.00698818   1.4966446   .57737936

1997   .03627838  -.15611398   .15043455   .05398246   1.6478452    1.009571

1999   .11814375  -.00525869   .02250082   .08790646   .80302795   .41872088

2001   .08085215   .03209268   .00536218   .03539566   .28864816   .08032318

2003   .01760212   -.0463809   .07889079   .03968931   3.0012058   2.1627047

2005   .01684409  -.07215183   .09026966    .0235811   2.3966124   .67002878

2007   .03959067   .03748774   .09446534   .06242606   2.9837086   .87488072

2009   .02200718  -.04037616   .05716718   .05698124   2.5813597   1.7555616

Total   .08024411  -.08345948    .0901088   .06647665    2.531596    1.108369

Does it make sense that the means don't converge?  Is there a way to
force the random subsample to have the same means as the main data
set?

--
Olga Gorbachev
Assistant Professor of Economics
University of Delaware
Newark, DE 19716
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```