Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: why results after expanding data for probability weight is only close to the svy estimation

 From Amanda Fu To statalist@hsphsun2.harvard.edu Subject st: why results after expanding data for probability weight is only close to the svy estimation Date Sat, 25 Dec 2010 13:57:53 -0500

```Hi all,

I  was trying to figure out how to expand the data set according to
the sampling weight on a survey to get the same result of using svy
estimation and without svy estimation. Let's use OLS as an example.
In the following data set, first I expand the data simply according to
sampling weight. The result (estimated coefficients) are close but not
equal to the survey estimation. Then I expand the data set by sampling
weight*100, the results are more closer. My question is, is the
difference between the svy estimation and regular estimation on
expanded data set caused by the change of sample size?

Sincerely,
Amanda

---------------------------------start here----------------
. clear all
. input id  prob sweight     y        x
id       prob    sweight          y          x
1. 1       0.2     5         79       1200
2. 2       0.2     5         10       2700
3. 3       0.3     3.33    15       2500
4. 4       0.1     10        21      2800
5. 5       0.2     5          16      2480
6. end

. lab var prob    "selection probability"
. lab var sweight "sampling weight,=1/prob"
. svyset [pw=sweight]
pweight: sweight
***********************************************************************************
*                                  SVY ESTIMATION RESULT (1)
*
************************************************************************************
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: <observations>
FPC 1: <zero>

. svy: reg y x
(running regress on estimation sample)
Survey: Linear regression
Number of strata   =         1                     Number of obs
=         5
Number of PSUs     =         5                   Population size    =     28.33

Design df          =         4
F(
1,      4)    =     64.60

Prob > F           =    0.0013

R-squared          =    0.8968
------------------------------------------------------------------------------
|   Linearized
y |      Coef.   Std. Err.      t    P>|t|     [95%
Conf. Interval]
-------------+----------------------------------------------------------------
x |        -.0397343     .0049435    -8.04   0.001
-.0534598   -.0260088
_cons |   123.3965    9.286147      13.29   0.000     97.61402
149.179
------------------------------------------------------------------------------
**************************EXPAND DATA SET BY SWEIGHT*************
. expand ceil(sweight)
(24 observations created)
****** I tried to use .expand round(sweight). the difference between
the result with (1) is larger.
.list
id	prob	sweight	y	x
1	.2	5	79	1200
1	.2	5	79	1200
1	.2	5	79	1200
1	.2	5	79	1200
1	.2	5	79	1200
2	.2	5	10	2700
2	.2	5	10	2700
2	.2	5	10	2700
2	.2	5	10	2700
2	.2	5	10	2700
3	.3	3.33	15	2500
3	.3	3.33	15	2500
3	.3	3.33	15	2500
3	.3	3.33	15	2500
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
5	.2	5	16	2480
5	.2	5	16	2480
5	.2	5	16	2480
5	.2	5	16	2480
5	.2	5	16	2480
*********************************************************************************************
*                               EXPAND DATA SET BY SWEIGHT RESULT (2)
*
*********************************************************************************************
. reg y x
Source |       SS       df       MS
Number of obs =      29
-------------+------------------------------
F(  1,    27) =  228.33
Model |   14756.097     1   14756.097                     Prob
> F      =  0.0000
Residual |   1744.9375    27  64.6273149                 R-squared
=  0.8943
-------------+------------------------------
Total |  16501.0345    28   589.32266                     Root
MSE      =  8.0391
------------------------------------------------------------------------------
y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x |             -.0397927   .0026335   -15.11   0.000
-.0451961   -.0343893
_cons |   123.3279        6.520714      18.91   0.000
109.9485    136.7073
------------------------------------------------------------------------------
*********************************************************************************************
*                       EXPAND DATA SET BY SWEIGHT*100 RESULT (3)
*
*********************************************************************************************
expand
. expand ceil(sweight*100)
(2828 observations created)

. reg y x
Source        SS	df       MS		                   Number of obs	=    2833
F(  1,
2831)	=24613.57
Model   1470410.91	1  1470410.91		         Prob > F	=  0.0000
Residual   169123.508	2831  59.7398474		R-squared	=  0.8968

Total   1639534.42	2832  578.931644		Root MSE	=  7.7292
-------------------------------------------------------------------
y       Coef.	Std. Err.      t	P>t	[95% Conf.	Interval]
--------------------------------------------------------------------
x                 -.0397343	.0002533  -156.89	0.000	-.0402309	-.0392377
_cons    123.3965	        .6269718   196.81	0.000	122.1671	124.6259
----------------------------------------end-------------------------------------------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```