Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: why results after expanding data for probability weight is only close to the svy estimation


From   Amanda Fu <mandy.fu1@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: why results after expanding data for probability weight is only close to the svy estimation
Date   Sat, 25 Dec 2010 13:57:53 -0500

Hi all,

I  was trying to figure out how to expand the data set according to
the sampling weight on a survey to get the same result of using svy
estimation and without svy estimation. Let's use OLS as an example.
In the following data set, first I expand the data simply according to
sampling weight. The result (estimated coefficients) are close but not
equal to the survey estimation. Then I expand the data set by sampling
weight*100, the results are more closer. My question is, is the
difference between the svy estimation and regular estimation on
expanded data set caused by the change of sample size?

Thank you for your time!

Sincerely,
Amanda

---------------------------------start here----------------
. clear all
***********READ DATA*************************
. input id  prob sweight     y        x
     id       prob    sweight          y          x
  1. 1       0.2     5         79       1200
  2. 2       0.2     5         10       2700
  3. 3       0.3     3.33    15       2500
  4. 4       0.1     10        21      2800
  5. 5       0.2     5          16      2480
  6. end

. lab var prob    "selection probability"
. lab var sweight "sampling weight,=1/prob"
. svyset [pw=sweight]
      pweight: sweight
***********************************************************************************
*                                  SVY ESTIMATION RESULT (1)
                       *
************************************************************************************
 VCE: linearized
  Single unit: missing
   Strata 1: <one>
   SU 1: <observations>
   FPC 1: <zero>

. svy: reg y x
(running regress on estimation sample)
Survey: Linear regression
Number of strata   =         1                     Number of obs
=         5
Number of PSUs     =         5                   Population size    =     28.33

Design df          =         4
                                                                   F(
 1,      4)    =     64.60

Prob > F           =    0.0013

R-squared          =    0.8968
  ------------------------------------------------------------------------------
                 |   Linearized
               y |      Coef.   Std. Err.      t    P>|t|     [95%
Conf. Interval]
  -------------+----------------------------------------------------------------
           x |        -.0397343     .0049435    -8.04   0.001
-.0534598   -.0260088
       _cons |   123.3965    9.286147      13.29   0.000     97.61402
   149.179
  ------------------------------------------------------------------------------
**************************EXPAND DATA SET BY SWEIGHT*************
. expand ceil(sweight)
(24 observations created)
****** I tried to use .expand round(sweight). the difference between
the result with (1) is larger.
.list
id	prob	sweight	y	x
1	.2	5	79	1200
1	.2	5	79	1200
1	.2	5	79	1200
1	.2	5	79	1200
1	.2	5	79	1200
2	.2	5	10	2700
2	.2	5	10	2700
2	.2	5	10	2700
2	.2	5	10	2700
2	.2	5	10	2700
3	.3	3.33	15	2500
3	.3	3.33	15	2500
3	.3	3.33	15	2500
3	.3	3.33	15	2500
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
4	.1	10	21	2800
5	.2	5	16	2480
5	.2	5	16	2480
5	.2	5	16	2480
5	.2	5	16	2480
5	.2	5	16	2480
*********************************************************************************************
*                               EXPAND DATA SET BY SWEIGHT RESULT (2)
                   *
*********************************************************************************************
. reg y x
   Source |       SS       df       MS
     Number of obs =      29
-------------+------------------------------
         F(  1,    27) =  228.33
       Model |   14756.097     1   14756.097                     Prob
> F      =  0.0000
    Residual |   1744.9375    27  64.6273149                 R-squared
    =  0.8943
-------------+------------------------------
         Adj R-squared =  0.8903
       Total |  16501.0345    28   589.32266                     Root
MSE      =  8.0391
 ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           x |             -.0397927   .0026335   -15.11   0.000
-.0451961   -.0343893
       _cons |   123.3279        6.520714      18.91   0.000
109.9485    136.7073
------------------------------------------------------------------------------
*********************************************************************************************
*                       EXPAND DATA SET BY SWEIGHT*100 RESULT (3)
              *
*********************************************************************************************
expand
. expand ceil(sweight*100)
(2828 observations created)

. reg y x
Source        SS	df       MS		                   Number of obs	=    2833
			                                                           F(  1,
2831)	=24613.57
Model   1470410.91	1  1470410.91		         Prob > F	=  0.0000
Residual   169123.508	2831  59.7398474		R-squared	=  0.8968
		
Adj R-squared	=  0.8968
Total   1639534.42	2832  578.931644		Root MSE	=  7.7292
-------------------------------------------------------------------				
y       Coef.	Std. Err.      t	P>t	[95% Conf.	Interval]
--------------------------------------------------------------------				
x                 -.0397343	.0002533  -156.89	0.000	-.0402309	-.0392377
_cons    123.3965	        .6269718   196.81	0.000	122.1671	124.6259
----------------------------------------end-------------------------------------------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index