Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: convergence of sample mean using gsample with weights

From	tshmak <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: RE: convergence of sample mean using gsample with weights
Date	Wed, 15 May 2013 16:32:20 +0800

Hi, 

I'm not exactly sure what you're trying to do here. Perhaps if you explain what `wt', `wtn', and `wtn2' are, that might help. 

Intuitively though, it looks like you're adding `meannew' - `meanold' to `res' in every loop. 

Suppose X_i is a random variable such that X_i = `meannew' - `meanold' for loop i, 

Then res = sum_i X_i

If X_i are independent across loops, then:
E(res) = sum_i E(X_i)
Var(res) = sum_i Var(X_i)

Since you're sampling from your original data, let's say
E(X_i) = m, which is `meannew' - `meanold' from your original data (appropriately weighted)
Suppose: 
Var(X_i) = Var(X) for all i
then
Var(res) = N0 Var(X)
Var(res/N0) = Var(X)/N0
E(res/N0) = m

Therefore, it appears that res/N0 should converge to m. 

Is that what's happening? 

Tim




-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Olga Gorbachev
Sent: 14 May 2013 01:51
To: [email protected]
Subject: st: convergence of sample mean using gsample with weights

Dear List servers,

We are trying to match the means of the subsample that is randomly
generated using gsample with weights with that of the original sample.
but are not successful, the differences in means are persistent, even
after over 5000 iterations.

The program we are running to generate a random sample and the table
of differences in means are below:

local res = 0
local N0 = 1000
di "i = " _c
forv i = 1/`N0' {
    di " `i'" _c
    cap: drop wtn2
    qui: gen wtn2 = .
    qui: levelsof year, local(years)
    foreach yr of local years {
        su work [aw = wt] if year == `yr', meanonly
        local pct = 1 - r(mean)
        qui: count if year == `yr' & work & wt > 0
        local n = r(N) * `pct'
        gsample `n' if year == `yr' & work [aw = wtn], gen(smpl) replace
        // qui: gen smpl`yr' = smpl
        qui: replace wtn2 = wtn * smpl if year == `yr'
    }

    su nokid if year == 2009 [aw = wtn2], meanonly
    local meannew = r(mean)
    su nokid if year == 2009 & !work [aw = wt], meanonly
    local meanold = r(mean)
    local res = `res' + `meannew' - `meanold'
}

di `res' / `N0'


After  5751 iterations, the mean differences are persistent: (white,
nokid, wife are 0/1 dummies, ed is 0/1/2/3 categorical var)


            white          ed       nokid        wife        age        RMSE

 1968   .07075803   .02760528  -.07051057   .10025028  -1.9695697   .64914917

 1969    .0685191    .0043999   .00714388   .07798387  -1.0421818   .44748337

 1970   .06611464   .05476483  -.02358097     .077666  -1.8464169   .76425403

 1971   .06971375   .02524083  -.04641226   .07669907  -1.9877308   .84812203

 1972   .06842085   .00459005  -.01252929   .07209953  -1.4438688   .58143546

 1973   .07875147   -.0065409  -.00719551    .0762982  -.84213075   .76031927

 1974   .07394796   .01265153   .03503028   .06037437   .47233679   .45948809

 1975    .0754228  -.02080965   .04125415   .06711441   1.3878045   .44676919

 1976   .07922582   -.0270845    .0703499   .08149621   3.0252375   1.2009947

 1977   .07757246  -.06248362   .13932287   .05320747    4.381814   1.9495654

 1978    .0712201  -.10770348   .09020478   .07284452   3.3190634   1.0499406

 1979    .0867201  -.11178738   .11253834   .07264209    2.972378   1.5287306

 1980   .07419313  -.03035967   .13589319   .06733552   4.0215276   1.7365936

 1981   .07878431  -.01949136   .17420796   .04241048   5.3660359   2.8373346

 1982    .0829203  -.11727873   .17645938   .05927291   5.4543346   2.3774178

 1983   .07845573  -.10130641   .09725345    .0687734   2.3112865   1.4648557

 1984   .09015502  -.07159415   .09821572    .0418674   4.0170757   2.2326577

 1985   .07475118  -.17578234   .15213582   .06892136   4.8550365   2.4831803

 1986   .09253893  -.20126191   .16269138   .06200215   4.8770742   1.7727842

 1987   .08100237  -.17625041   .14548996   .05864067   3.6014147   1.6212222

 1988   .10134555  -.08595601   .20253243   .08289725   6.7326522   2.8191897

 1989   .08155963  -.10436591   .15625212   .02222005   3.9071876   1.0379599

 1990   .09724568   .03089819    .1811577   .08476095   4.8926164   2.3862564

 1991   .08948172  -.03575608    .2627551   .08514362   8.0915346   3.8331833

 1992   .08865055   -.1055572   .25235049   .10462895   7.4632178   3.3645744

 1993    .0951815  -.07997661   .17046405   .06604482   4.2573245   2.0752016

 1994   .04873715  -.16646878   .07550069   .03458139   .98317415   .39188131

 1995   .06876277  -.13850863   .13320029   .02768267   2.4135662   .66990443

 1996   .00856876  -.21758791   .08564262  -.00698818   1.4966446   .57737936

 1997   .03627838  -.15611398   .15043455   .05398246   1.6478452    1.009571

 1999   .11814375  -.00525869   .02250082   .08790646   .80302795   .41872088

 2001   .08085215   .03209268   .00536218   .03539566   .28864816   .08032318

 2003   .01760212   -.0463809   .07889079   .03968931   3.0012058   2.1627047

 2005   .01684409  -.07215183   .09026966    .0235811   2.3966124   .67002878

 2007   .03959067   .03748774   .09446534   .06242606   2.9837086   .87488072

 2009   .02200718  -.04037616   .05716718   .05698124   2.5813597   1.7555616

Total   .08024411  -.08345948    .0901088   .06647665    2.531596    1.108369


Does it make sense that the means don't converge?  Is there a way to
force the random subsample to have the same means as the main data
set?

thank you in advance,
--
Olga Gorbachev
Assistant Professor of Economics
University of Delaware
Newark, DE 19716
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: convergence of sample mean using gsample with weights
  - From: Olga Gorbachev <[email protected]>

Prev by Date: st: xtnbreg with random effects
Next by Date: st: Combining ivregress and heckman
Previous by thread: st: convergence of sample mean using gsample with weights
Next by thread: st: Multiple Imputation (MI)
Index(es):
- Date
- Thread