Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: convergence of sample mean using gsample with weights
From 
 
tshmak <[email protected]> 
To 
 
"[email protected]" <[email protected]> 
Subject 
 
st: RE: convergence of sample mean using gsample with weights 
Date 
 
Wed, 15 May 2013 16:32:20 +0800 
Hi, 
I'm not exactly sure what you're trying to do here. Perhaps if you explain what `wt', `wtn', and `wtn2' are, that might help. 
Intuitively though, it looks like you're adding `meannew' - `meanold' to `res' in every loop. 
Suppose X_i is a random variable such that X_i = `meannew' - `meanold' for loop i, 
Then res = sum_i X_i
If X_i are independent across loops, then:
E(res) = sum_i E(X_i)
Var(res) = sum_i Var(X_i)
Since you're sampling from your original data, let's say
E(X_i) = m, which is `meannew' - `meanold' from your original data (appropriately weighted)
Suppose: 
Var(X_i) = Var(X) for all i
then
Var(res) = N0 Var(X)
Var(res/N0) = Var(X)/N0
E(res/N0) = m
Therefore, it appears that res/N0 should converge to m. 
Is that what's happening? 
Tim
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Olga Gorbachev
Sent: 14 May 2013 01:51
To: [email protected]
Subject: st: convergence of sample mean using gsample with weights
Dear List servers,
We are trying to match the means of the subsample that is randomly
generated using gsample with weights with that of the original sample.
but are not successful, the differences in means are persistent, even
after over 5000 iterations.
The program we are running to generate a random sample and the table
of differences in means are below:
local res = 0
local N0 = 1000
di "i = " _c
forv i = 1/`N0' {
    di " `i'" _c
    cap: drop wtn2
    qui: gen wtn2 = .
    qui: levelsof year, local(years)
    foreach yr of local years {
        su work [aw = wt] if year == `yr', meanonly
        local pct = 1 - r(mean)
        qui: count if year == `yr' & work & wt > 0
        local n = r(N) * `pct'
        gsample `n' if year == `yr' & work [aw = wtn], gen(smpl) replace
        // qui: gen smpl`yr' = smpl
        qui: replace wtn2 = wtn * smpl if year == `yr'
    }
    su nokid if year == 2009 [aw = wtn2], meanonly
    local meannew = r(mean)
    su nokid if year == 2009 & !work [aw = wt], meanonly
    local meanold = r(mean)
    local res = `res' + `meannew' - `meanold'
}
di `res' / `N0'
After  5751 iterations, the mean differences are persistent: (white,
nokid, wife are 0/1 dummies, ed is 0/1/2/3 categorical var)
            white          ed       nokid        wife        age        RMSE
 1968   .07075803   .02760528  -.07051057   .10025028  -1.9695697   .64914917
 1969    .0685191    .0043999   .00714388   .07798387  -1.0421818   .44748337
 1970   .06611464   .05476483  -.02358097     .077666  -1.8464169   .76425403
 1971   .06971375   .02524083  -.04641226   .07669907  -1.9877308   .84812203
 1972   .06842085   .00459005  -.01252929   .07209953  -1.4438688   .58143546
 1973   .07875147   -.0065409  -.00719551    .0762982  -.84213075   .76031927
 1974   .07394796   .01265153   .03503028   .06037437   .47233679   .45948809
 1975    .0754228  -.02080965   .04125415   .06711441   1.3878045   .44676919
 1976   .07922582   -.0270845    .0703499   .08149621   3.0252375   1.2009947
 1977   .07757246  -.06248362   .13932287   .05320747    4.381814   1.9495654
 1978    .0712201  -.10770348   .09020478   .07284452   3.3190634   1.0499406
 1979    .0867201  -.11178738   .11253834   .07264209    2.972378   1.5287306
 1980   .07419313  -.03035967   .13589319   .06733552   4.0215276   1.7365936
 1981   .07878431  -.01949136   .17420796   .04241048   5.3660359   2.8373346
 1982    .0829203  -.11727873   .17645938   .05927291   5.4543346   2.3774178
 1983   .07845573  -.10130641   .09725345    .0687734   2.3112865   1.4648557
 1984   .09015502  -.07159415   .09821572    .0418674   4.0170757   2.2326577
 1985   .07475118  -.17578234   .15213582   .06892136   4.8550365   2.4831803
 1986   .09253893  -.20126191   .16269138   .06200215   4.8770742   1.7727842
 1987   .08100237  -.17625041   .14548996   .05864067   3.6014147   1.6212222
 1988   .10134555  -.08595601   .20253243   .08289725   6.7326522   2.8191897
 1989   .08155963  -.10436591   .15625212   .02222005   3.9071876   1.0379599
 1990   .09724568   .03089819    .1811577   .08476095   4.8926164   2.3862564
 1991   .08948172  -.03575608    .2627551   .08514362   8.0915346   3.8331833
 1992   .08865055   -.1055572   .25235049   .10462895   7.4632178   3.3645744
 1993    .0951815  -.07997661   .17046405   .06604482   4.2573245   2.0752016
 1994   .04873715  -.16646878   .07550069   .03458139   .98317415   .39188131
 1995   .06876277  -.13850863   .13320029   .02768267   2.4135662   .66990443
 1996   .00856876  -.21758791   .08564262  -.00698818   1.4966446   .57737936
 1997   .03627838  -.15611398   .15043455   .05398246   1.6478452    1.009571
 1999   .11814375  -.00525869   .02250082   .08790646   .80302795   .41872088
 2001   .08085215   .03209268   .00536218   .03539566   .28864816   .08032318
 2003   .01760212   -.0463809   .07889079   .03968931   3.0012058   2.1627047
 2005   .01684409  -.07215183   .09026966    .0235811   2.3966124   .67002878
 2007   .03959067   .03748774   .09446534   .06242606   2.9837086   .87488072
 2009   .02200718  -.04037616   .05716718   .05698124   2.5813597   1.7555616
Total   .08024411  -.08345948    .0901088   .06647665    2.531596    1.108369
Does it make sense that the means don't converge?  Is there a way to
force the random subsample to have the same means as the main data
set?
thank you in advance,
--
Olga Gorbachev
Assistant Professor of Economics
University of Delaware
Newark, DE 19716
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/