Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: convergence of sample mean using gsample with weights


From   tshmak <tshmak@hku.hk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: convergence of sample mean using gsample with weights
Date   Wed, 15 May 2013 16:32:20 +0800

Hi, 

I'm not exactly sure what you're trying to do here. Perhaps if you explain what `wt', `wtn', and `wtn2' are, that might help. 

Intuitively though, it looks like you're adding `meannew' - `meanold' to `res' in every loop. 

Suppose X_i is a random variable such that X_i = `meannew' - `meanold' for loop i, 

Then res = sum_i X_i

If X_i are independent across loops, then:
E(res) = sum_i E(X_i)
Var(res) = sum_i Var(X_i)

Since you're sampling from your original data, let's say
E(X_i) = m, which is `meannew' - `meanold' from your original data (appropriately weighted)
Suppose: 
Var(X_i) = Var(X) for all i
then
Var(res) = N0 Var(X)
Var(res/N0) = Var(X)/N0
E(res/N0) = m

Therefore, it appears that res/N0 should converge to m. 

Is that what's happening? 

Tim




-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Olga Gorbachev
Sent: 14 May 2013 01:51
To: statalist@hsphsun2.harvard.edu
Subject: st: convergence of sample mean using gsample with weights

Dear List servers,

We are trying to match the means of the subsample that is randomly
generated using gsample with weights with that of the original sample.
but are not successful, the differences in means are persistent, even
after over 5000 iterations.

The program we are running to generate a random sample and the table
of differences in means are below:

local res = 0
local N0 = 1000
di "i = " _c
forv i = 1/`N0' {
    di " `i'" _c
    cap: drop wtn2
    qui: gen wtn2 = .
    qui: levelsof year, local(years)
    foreach yr of local years {
        su work [aw = wt] if year == `yr', meanonly
        local pct = 1 - r(mean)
        qui: count if year == `yr' & work & wt > 0
        local n = r(N) * `pct'
        gsample `n' if year == `yr' & work [aw = wtn], gen(smpl) replace
        // qui: gen smpl`yr' = smpl
        qui: replace wtn2 = wtn * smpl if year == `yr'
    }

    su nokid if year == 2009 [aw = wtn2], meanonly
    local meannew = r(mean)
    su nokid if year == 2009 & !work [aw = wt], meanonly
    local meanold = r(mean)
    local res = `res' + `meannew' - `meanold'
}

di `res' / `N0'


After  5751 iterations, the mean differences are persistent: (white,
nokid, wife are 0/1 dummies, ed is 0/1/2/3 categorical var)


            white          ed       nokid        wife        age        RMSE

 1968   .07075803   .02760528  -.07051057   .10025028  -1.9695697   .64914917

 1969    .0685191    .0043999   .00714388   .07798387  -1.0421818   .44748337

 1970   .06611464   .05476483  -.02358097     .077666  -1.8464169   .76425403

 1971   .06971375   .02524083  -.04641226   .07669907  -1.9877308   .84812203

 1972   .06842085   .00459005  -.01252929   .07209953  -1.4438688   .58143546

 1973   .07875147   -.0065409  -.00719551    .0762982  -.84213075   .76031927

 1974   .07394796   .01265153   .03503028   .06037437   .47233679   .45948809

 1975    .0754228  -.02080965   .04125415   .06711441   1.3878045   .44676919

 1976   .07922582   -.0270845    .0703499   .08149621   3.0252375   1.2009947

 1977   .07757246  -.06248362   .13932287   .05320747    4.381814   1.9495654

 1978    .0712201  -.10770348   .09020478   .07284452   3.3190634   1.0499406

 1979    .0867201  -.11178738   .11253834   .07264209    2.972378   1.5287306

 1980   .07419313  -.03035967   .13589319   .06733552   4.0215276   1.7365936

 1981   .07878431  -.01949136   .17420796   .04241048   5.3660359   2.8373346

 1982    .0829203  -.11727873   .17645938   .05927291   5.4543346   2.3774178

 1983   .07845573  -.10130641   .09725345    .0687734   2.3112865   1.4648557

 1984   .09015502  -.07159415   .09821572    .0418674   4.0170757   2.2326577

 1985   .07475118  -.17578234   .15213582   .06892136   4.8550365   2.4831803

 1986   .09253893  -.20126191   .16269138   .06200215   4.8770742   1.7727842

 1987   .08100237  -.17625041   .14548996   .05864067   3.6014147   1.6212222

 1988   .10134555  -.08595601   .20253243   .08289725   6.7326522   2.8191897

 1989   .08155963  -.10436591   .15625212   .02222005   3.9071876   1.0379599

 1990   .09724568   .03089819    .1811577   .08476095   4.8926164   2.3862564

 1991   .08948172  -.03575608    .2627551   .08514362   8.0915346   3.8331833

 1992   .08865055   -.1055572   .25235049   .10462895   7.4632178   3.3645744

 1993    .0951815  -.07997661   .17046405   .06604482   4.2573245   2.0752016

 1994   .04873715  -.16646878   .07550069   .03458139   .98317415   .39188131

 1995   .06876277  -.13850863   .13320029   .02768267   2.4135662   .66990443

 1996   .00856876  -.21758791   .08564262  -.00698818   1.4966446   .57737936

 1997   .03627838  -.15611398   .15043455   .05398246   1.6478452    1.009571

 1999   .11814375  -.00525869   .02250082   .08790646   .80302795   .41872088

 2001   .08085215   .03209268   .00536218   .03539566   .28864816   .08032318

 2003   .01760212   -.0463809   .07889079   .03968931   3.0012058   2.1627047

 2005   .01684409  -.07215183   .09026966    .0235811   2.3966124   .67002878

 2007   .03959067   .03748774   .09446534   .06242606   2.9837086   .87488072

 2009   .02200718  -.04037616   .05716718   .05698124   2.5813597   1.7555616

Total   .08024411  -.08345948    .0901088   .06647665    2.531596    1.108369


Does it make sense that the means don't converge?  Is there a way to
force the random subsample to have the same means as the main data
set?

thank you in advance,
--
Olga Gorbachev
Assistant Professor of Economics
University of Delaware
Newark, DE 19716
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index