Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
tshmak <tshmak@hku.hk> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: convergence of sample mean using gsample with weights |

Date |
Wed, 15 May 2013 16:32:20 +0800 |

Hi, I'm not exactly sure what you're trying to do here. Perhaps if you explain what `wt', `wtn', and `wtn2' are, that might help. Intuitively though, it looks like you're adding `meannew' - `meanold' to `res' in every loop. Suppose X_i is a random variable such that X_i = `meannew' - `meanold' for loop i, Then res = sum_i X_i If X_i are independent across loops, then: E(res) = sum_i E(X_i) Var(res) = sum_i Var(X_i) Since you're sampling from your original data, let's say E(X_i) = m, which is `meannew' - `meanold' from your original data (appropriately weighted) Suppose: Var(X_i) = Var(X) for all i then Var(res) = N0 Var(X) Var(res/N0) = Var(X)/N0 E(res/N0) = m Therefore, it appears that res/N0 should converge to m. Is that what's happening? Tim -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Olga Gorbachev Sent: 14 May 2013 01:51 To: statalist@hsphsun2.harvard.edu Subject: st: convergence of sample mean using gsample with weights Dear List servers, We are trying to match the means of the subsample that is randomly generated using gsample with weights with that of the original sample. but are not successful, the differences in means are persistent, even after over 5000 iterations. The program we are running to generate a random sample and the table of differences in means are below: local res = 0 local N0 = 1000 di "i = " _c forv i = 1/`N0' { di " `i'" _c cap: drop wtn2 qui: gen wtn2 = . qui: levelsof year, local(years) foreach yr of local years { su work [aw = wt] if year == `yr', meanonly local pct = 1 - r(mean) qui: count if year == `yr' & work & wt > 0 local n = r(N) * `pct' gsample `n' if year == `yr' & work [aw = wtn], gen(smpl) replace // qui: gen smpl`yr' = smpl qui: replace wtn2 = wtn * smpl if year == `yr' } su nokid if year == 2009 [aw = wtn2], meanonly local meannew = r(mean) su nokid if year == 2009 & !work [aw = wt], meanonly local meanold = r(mean) local res = `res' + `meannew' - `meanold' } di `res' / `N0' After 5751 iterations, the mean differences are persistent: (white, nokid, wife are 0/1 dummies, ed is 0/1/2/3 categorical var) white ed nokid wife age RMSE 1968 .07075803 .02760528 -.07051057 .10025028 -1.9695697 .64914917 1969 .0685191 .0043999 .00714388 .07798387 -1.0421818 .44748337 1970 .06611464 .05476483 -.02358097 .077666 -1.8464169 .76425403 1971 .06971375 .02524083 -.04641226 .07669907 -1.9877308 .84812203 1972 .06842085 .00459005 -.01252929 .07209953 -1.4438688 .58143546 1973 .07875147 -.0065409 -.00719551 .0762982 -.84213075 .76031927 1974 .07394796 .01265153 .03503028 .06037437 .47233679 .45948809 1975 .0754228 -.02080965 .04125415 .06711441 1.3878045 .44676919 1976 .07922582 -.0270845 .0703499 .08149621 3.0252375 1.2009947 1977 .07757246 -.06248362 .13932287 .05320747 4.381814 1.9495654 1978 .0712201 -.10770348 .09020478 .07284452 3.3190634 1.0499406 1979 .0867201 -.11178738 .11253834 .07264209 2.972378 1.5287306 1980 .07419313 -.03035967 .13589319 .06733552 4.0215276 1.7365936 1981 .07878431 -.01949136 .17420796 .04241048 5.3660359 2.8373346 1982 .0829203 -.11727873 .17645938 .05927291 5.4543346 2.3774178 1983 .07845573 -.10130641 .09725345 .0687734 2.3112865 1.4648557 1984 .09015502 -.07159415 .09821572 .0418674 4.0170757 2.2326577 1985 .07475118 -.17578234 .15213582 .06892136 4.8550365 2.4831803 1986 .09253893 -.20126191 .16269138 .06200215 4.8770742 1.7727842 1987 .08100237 -.17625041 .14548996 .05864067 3.6014147 1.6212222 1988 .10134555 -.08595601 .20253243 .08289725 6.7326522 2.8191897 1989 .08155963 -.10436591 .15625212 .02222005 3.9071876 1.0379599 1990 .09724568 .03089819 .1811577 .08476095 4.8926164 2.3862564 1991 .08948172 -.03575608 .2627551 .08514362 8.0915346 3.8331833 1992 .08865055 -.1055572 .25235049 .10462895 7.4632178 3.3645744 1993 .0951815 -.07997661 .17046405 .06604482 4.2573245 2.0752016 1994 .04873715 -.16646878 .07550069 .03458139 .98317415 .39188131 1995 .06876277 -.13850863 .13320029 .02768267 2.4135662 .66990443 1996 .00856876 -.21758791 .08564262 -.00698818 1.4966446 .57737936 1997 .03627838 -.15611398 .15043455 .05398246 1.6478452 1.009571 1999 .11814375 -.00525869 .02250082 .08790646 .80302795 .41872088 2001 .08085215 .03209268 .00536218 .03539566 .28864816 .08032318 2003 .01760212 -.0463809 .07889079 .03968931 3.0012058 2.1627047 2005 .01684409 -.07215183 .09026966 .0235811 2.3966124 .67002878 2007 .03959067 .03748774 .09446534 .06242606 2.9837086 .87488072 2009 .02200718 -.04037616 .05716718 .05698124 2.5813597 1.7555616 Total .08024411 -.08345948 .0901088 .06647665 2.531596 1.108369 Does it make sense that the means don't converge? Is there a way to force the random subsample to have the same means as the main data set? thank you in advance, -- Olga Gorbachev Assistant Professor of Economics University of Delaware Newark, DE 19716 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: convergence of sample mean using gsample with weights***From:*Olga Gorbachev <olga.gorbachev@gmail.com>

- Prev by Date:
**st: xtnbreg with random effects** - Next by Date:
**st: Combining ivregress and heckman** - Previous by thread:
**st: convergence of sample mean using gsample with weights** - Next by thread:
**st: Multiple Imputation (MI)** - Index(es):