Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Svy poststratification VS Pweighting


From   francesco manaresi <manaresi@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Svy poststratification VS Pweighting
Date   Mon, 21 Jun 2010 17:42:18 +0200

I've seen several questions on the issue of poststratification in
Statalist, but would like to ask you some clarifications on the
estimate of standard errors. Thank you for your kindness and
availability.
I have got a sample of firms which have been (supposedly) randomly
drawn from a reference population, and would like to post-stratify
based on two observable characteristics for which all cross-tables are
available.

A first strategy is simply to create poststrata and postweight
variables to use with svyset.
This works fine but there are some (mainly user written) commands that
do not support "svy:". (I am particularly interested on matching
estimators)

In those cases I would like to use pweight. I've seen this answer
http://www.stata.com/statalist/archive/2008-11/msg00152.html ;(and
several others) which suggest using N_h/n_h as pweight (where N_h is
the tot.number of observations in pop from stratum h, and n_h is the
tot.number of observations in sample from stratum h). This is actually
correct because corresponds to the inverse of the probability of being
selected given you are belonging to a specific stratum (it is possible
to prove it by applying Bayes Rule)

However standard errors dramatically differ: in particular, they are
much larger with the latter method wrt the former.

The question is: which one should I use? And if the answer is "Stata's
postweight command", how can I implement them in commands that do not
support the "svy :" prefix?

As an example, I tried a simple simulation with a fictitious sample
out of a fictitious population , I report results for the % of firms
belonging to the Northern part of Italy in four cases:
1- for the real population
2- for the unweighted sample
3- for the weighted sample, using "poststrata" and "postweight"
4- for the weighted sample, using the "N_h / n_h" formula
You can see that point estimate is the same (obviously) but standard
errors is much larger in case 4 wrt case 3:

1- All Population


. mean nord

Mean estimation                     Number of obs    =   17796

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        nord |   .5057316   .0037479      .4983853    .5130779
			

2- Unweighted Sample
. mean nord if sample ==1

Mean estimation                     Number of obs    =    2531

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        nord |   .8411695   .0072669      .8269198    .8554192
--------------------------------------------------------------


3- Weighted Sample with Stata svy poststratification
. svy: mean nord if sample ==1
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =    2531
Number of PSUs   =    2531          Population size  =   17796
N. of poststrata =       4          Design df        =    2530

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        nord |   .5057316   5.05e-17      .5057316    .5057316
--------------------------------------------------------------

4- Weighted Sample, with N_h/n_h pweights

. mean nord if sample ==1 [pweight=pesobis]

Mean estimation                     Number of obs    =    2531

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        nord |   .5057316   .0147696      .4767698    .5346934
--------------------------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index