Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: why results after expanding data for probability weight is only close to the svy estimation

 From Amanda Fu To statalist@hsphsun2.harvard.edu Subject Re: st: why results after expanding data for probability weight is only close to the svy estimation Date Sun, 26 Dec 2010 16:36:38 -0500

```Dear Mr. Waldo,

Thank you very much for helping me with my question!

I am sorry that I did not make my question very clear. To my surprise,
you have given me the answer that I wanted to find out. Yes, I meant
to ask why (2) is diffrent from (1) and if I rescale the sweight with
a constant, like 10,100, etc. if the estimation results (just the
coefficients) will be different.

Thank you for reminding me the variance is a different story from the
coefficients before and after expanding.

Thank you for your help! I appreciate it.

Amanda

On Sat, Dec 25, 2010 at 1:57 PM, Amanda Fu <mandy.fu1@gmail.com> wrote:
> Hi all,
>
> I  was trying to figure out how to expand the data set according to
> the sampling weight on a survey to get the same result of using svy
> estimation and without svy estimation. Let's use OLS as an example.
> In the following data set, first I expand the data simply according to
> sampling weight. The result (estimated coefficients) are close but not
> equal to the survey estimation. Then I expand the data set by sampling
> weight*100, the results are more closer. My question is, is the
> difference between the svy estimation and regular estimation on
> expanded data set caused by the change of sample size?
>
> Thank you for your time!
>
> Sincerely,
> Amanda
>
> ---------------------------------start here----------------
> . clear all
> . input id  prob sweight     y        x
>     id       prob    sweight          y          x
>  1. 1       0.2     5         79       1200
>  2. 2       0.2     5         10       2700
>  3. 3       0.3     3.33    15       2500
>  4. 4       0.1     10        21      2800
>  5. 5       0.2     5          16      2480
>  6. end
>
> . lab var prob    "selection probability"
> . lab var sweight "sampling weight,=1/prob"
> . svyset [pw=sweight]
>      pweight: sweight
> ***********************************************************************************
> *                                  SVY ESTIMATION RESULT (1)
>                       *
> ************************************************************************************
>  VCE: linearized
>  Single unit: missing
>   Strata 1: <one>
>   SU 1: <observations>
>   FPC 1: <zero>
>
> . svy: reg y x
> (running regress on estimation sample)
> Survey: Linear regression
> Number of strata   =         1                     Number of obs
> =         5
> Number of PSUs     =         5                   Population size    =     28.33
>
> Design df          =         4
>                                                                   F(
>  1,      4)    =     64.60
>
> Prob > F           =    0.0013
>
> R-squared          =    0.8968
>  ------------------------------------------------------------------------------
>                 |   Linearized
>               y |      Coef.   Std. Err.      t    P>|t|     [95%
> Conf. Interval]
>  -------------+----------------------------------------------------------------
>           x |        -.0397343     .0049435    -8.04   0.001
> -.0534598   -.0260088
>       _cons |   123.3965    9.286147      13.29   0.000     97.61402
>   149.179
>  ------------------------------------------------------------------------------
> **************************EXPAND DATA SET BY SWEIGHT*************
> . expand ceil(sweight)
> (24 observations created)
> ****** I tried to use .expand round(sweight). the difference between
> the result with (1) is larger.
> .list
> id      prob    sweight y       x
> 1       .2      5       79      1200
> 1       .2      5       79      1200
> 1       .2      5       79      1200
> 1       .2      5       79      1200
> 1       .2      5       79      1200
> 2       .2      5       10      2700
> 2       .2      5       10      2700
> 2       .2      5       10      2700
> 2       .2      5       10      2700
> 2       .2      5       10      2700
> 3       .3      3.33    15      2500
> 3       .3      3.33    15      2500
> 3       .3      3.33    15      2500
> 3       .3      3.33    15      2500
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 4       .1      10      21      2800
> 5       .2      5       16      2480
> 5       .2      5       16      2480
> 5       .2      5       16      2480
> 5       .2      5       16      2480
> 5       .2      5       16      2480
> *********************************************************************************************
> *                               EXPAND DATA SET BY SWEIGHT RESULT (2)
>                   *
> *********************************************************************************************
> . reg y x
>   Source |       SS       df       MS
>     Number of obs =      29
> -------------+------------------------------
>         F(  1,    27) =  228.33
>       Model |   14756.097     1   14756.097                     Prob
>> F      =  0.0000
>    Residual |   1744.9375    27  64.6273149                 R-squared
>    =  0.8943
> -------------+------------------------------
>       Total |  16501.0345    28   589.32266                     Root
> MSE      =  8.0391
>  ------------------------------------------------------------------------------
>           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>      -------------+----------------------------------------------------------------
>           x |             -.0397927   .0026335   -15.11   0.000
> -.0451961   -.0343893
>       _cons |   123.3279        6.520714      18.91   0.000
> 109.9485    136.7073
> ------------------------------------------------------------------------------
> *********************************************************************************************
> *                       EXPAND DATA SET BY SWEIGHT*100 RESULT (3)
>              *
> *********************************************************************************************
> expand
> . expand ceil(sweight*100)
> (2828 observations created)
>
> . reg y x
> Source        SS        df       MS                                Number of obs        =    2833
>                                                                                   F(  1,
> 2831)   =24613.57
> Model   1470410.91      1  1470410.91                    Prob > F       =  0.0000
> Residual   169123.508   2831  59.7398474                R-squared       =  0.8968
>
> Total   1639534.42      2832  578.931644                Root MSE        =  7.7292
> -------------------------------------------------------------------
> y       Coef.   Std. Err.      t        P>t     [95% Conf.      Interval]
> --------------------------------------------------------------------
> x                 -.0397343     .0002533  -156.89       0.000   -.0402309       -.0392377
> _cons    123.3965               .6269718   196.81       0.000   122.1671        124.6259
> ----------------------------------------end-------------------------------------------------------------------
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```