Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Sampling weights (pweights) and regression analysis

From   Steve Samuels <>
Subject   Re: st: Sampling weights (pweights) and regression analysis
Date   Thu, 12 Jul 2012 20:16:25 -0400

On Jul 11, 2012, at 4:15 PM, Fatih Yilmaz wrote:

> I am having trouble with using sampling weights in my simple regression
> analysis.
> Here is the story:
> The survey data I have is not representative, where some groups were
> deliberately over or under-sampled.
>     The weights I was provided ara computed as follows:
> For group one (strata), population weight is 60%
>          				  sample weight is 40%
> 					Final Pweight = 60%/40%=1.5
> My questions:
> 1- I needed to drop some of the observations from the survey data: outliers,
> missings obs and also unrelated data.
>     so, can I still use the old (initial) weights or do I have to re-weight the
> data with respect to the dropped observations?
>     Or how problematic could it be to use old weights?

You should reweight for non-response.. Not doing so could be quite problematic. 
How you do thisdepends on what you know about the population. See the sections
 on nonresponse weighting in the books by Lohr or Groves et al. and in the PEAS page
referenced below. If you are dropping observations because of missing data for
some variables, you have a couple of choices. Probably best is to treat these as
"nonrespondents". Better would be to  impute missing variables  with Stata's
multiple imputation commands (see the help for -mi svyset-), but this would take
your analysis out of the realm of the "simple".

Note that if you want to analyze a subgroup, it is an error to discard
members of the sample who are not in the subgroup. Doing so risks standard
errors that are too small. See the section on "subpopulations" in
Stata's survey manual and in Lohr's book (reference)

> 2- Since, my weights were computed as w=(pop%)/(sample%) (in general, some other
> researchers may compute them as w=(sample%)/(pop%) ),
> when I estimate weighted OLS should I use "reg y x [pw=1/w]" or  ""reg y x
> [pw=w]".
Other researchers may, but they would be wrong. From your description, I think that
you have the right weights.  You can check by seeing if the stratum weight totals
add up to the known stratum population sizes.  ("total w, over(stratum)"

To do survey regression in Stata, you -svyset- the data and identify weights,
sampling strata, and clusters, if any. The regression estimation command is 
s -svy, subpop(): regress-

> Could you pls also recommend some resources on sampling weights and regression
> analysis (preferably practical sources ),

Lohr, S. L. (1999 1st Ed & 2009 2nd Ed). Sampling: Design and Analysis (2nd
ed.). Boston, MA: Cengage Brooks/Cole.

Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., &
Tourangeau, R. (2004 1st Ed, 2009, 2nd). Survey methodology. Hoboken, N.J.: Wiley.
with sections on weighting and non-response
and the exemplars page See especially:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index