Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Svy poststratification VS Pweighting

From	francesco manaresi <[email protected]>
To	[email protected]
Subject	st: Svy poststratification VS Pweighting
Date	Mon, 21 Jun 2010 17:42:18 +0200

I've seen several questions on the issue of poststratification in
Statalist, but would like to ask you some clarifications on the
estimate of standard errors. Thank you for your kindness and
availability.
I have got a sample of firms which have been (supposedly) randomly
drawn from a reference population, and would like to post-stratify
based on two observable characteristics for which all cross-tables are
available.

A first strategy is simply to create poststrata and postweight
variables to use with svyset.
This works fine but there are some (mainly user written) commands that
do not support "svy:". (I am particularly interested on matching
estimators)

In those cases I would like to use pweight. I've seen this answer
http://www.stata.com/statalist/archive/2008-11/msg00152.html ;(and
several others) which suggest using N_h/n_h as pweight (where N_h is
the tot.number of observations in pop from stratum h, and n_h is the
tot.number of observations in sample from stratum h). This is actually
correct because corresponds to the inverse of the probability of being
selected given you are belonging to a specific stratum (it is possible
to prove it by applying Bayes Rule)

However standard errors dramatically differ: in particular, they are
much larger with the latter method wrt the former.

The question is: which one should I use? And if the answer is "Stata's
postweight command", how can I implement them in commands that do not
support the "svy :" prefix?

As an example, I tried a simple simulation with a fictitious sample
out of a fictitious population , I report results for the % of firms
belonging to the Northern part of Italy in four cases:
1- for the real population
2- for the unweighted sample
3- for the weighted sample, using "poststrata" and "postweight"
4- for the weighted sample, using the "N_h / n_h" formula
You can see that point estimate is the same (obviously) but standard
errors is much larger in case 4 wrt case 3:

1- All Population


. mean nord

Mean estimation                     Number of obs    =   17796

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        nord |   .5057316   .0037479      .4983853    .5130779
			

2- Unweighted Sample
. mean nord if sample ==1

Mean estimation                     Number of obs    =    2531

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        nord |   .8411695   .0072669      .8269198    .8554192
--------------------------------------------------------------


3- Weighted Sample with Stata svy poststratification
. svy: mean nord if sample ==1
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =    2531
Number of PSUs   =    2531          Population size  =   17796
N. of poststrata =       4          Design df        =    2530

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        nord |   .5057316   5.05e-17      .5057316    .5057316
--------------------------------------------------------------

4- Weighted Sample, with N_h/n_h pweights

. mean nord if sample ==1 [pweight=pesobis]

Mean estimation                     Number of obs    =    2531

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        nord |   .5057316   .0147696      .4767698    .5346934
--------------------------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Svy poststratification VS Pweighting
  - From: Stas Kolenikov <[email protected]>

Prev by Date: Re: st: Need help to comeup with predicted mean after xtgee
Next by Date: RE: st: Comparison of the R-squared in a loglog and linear model
Previous by thread: Re: AW: st: AW: float to numeric??
Next by thread: Re: st: Svy poststratification VS Pweighting
Index(es):
- Date
- Thread