Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: pweight question

From   Steve Samuels <>
Subject   Re: st: pweight question
Date   Thu, 29 Apr 2010 21:33:56 -0400

No--they should be scaled to add to the size of the sampled
population, in this case the "several economic zones".The sum of the
original probability weights in a multi-stage sample will not
necessarily add to the known population size. But scaling them up to
the known size is a good idea.

I'm not surprised the data came from  SPSS file.  Scaling the weights
to sum to n was the only way to responsibly use the WEIGHT statement
in SPSS. But since  SPSS outside  the COMPLEX SAMPLES package couldn't
handle the stratum and cluster information from complex samples,
standard errors and  hypothesis tests were always wrong.


On Thu, Apr 29, 2010 at 9:01 PM, Steven Archambault
<> wrote:
> "The scale of the weights (what they sum to) doesn't tell you whether
> or not they are pweights. Scaling the variables to sum to the size of
> the sample is something you do when you expect to use a package (or
> command) that, like SPSS until recently, only accepts fweights."
> Okay, now that I read this, the scaling down to the sample size does
> make sense. The data was originally in an SPSS format. So,
> essentially, scaling up would give the weights in terms of the
> population of the country.  Correct?
> Thanks,
> Steve
> On Thu, Apr 29, 2010 at 6:52 PM, Steven Archambault
> <> wrote:
>> Thanks for the responses thus far. I cannot say it is all clear to me
>> now, but I am getting there.
>> As for the strata and clustering, this is data that was to be taken as
>> a representation of the population in several different "economic
>> zones". The observations are taken from different villages in each
>> zone. Actually, observations from each village have the exact same
>> weight. I also know the population and area of the individual
>> villages. I am assuming the "probability that an observation is in the
>> sample" is based on the population density of that village or economic
>> region. But, that isn't clear. Perhaps I could come up with my own
>> weights retrospectively?
>> I am also analyzing this for multilevel effects, using gllamm. So, I
>> do expect the weights to matter.
>> Any further guidance would be very helpful!
>> Thanks,
>> Steve
>> On Thu, Apr 29, 2010 at 6:37 PM, Steve Samuels <> wrote:
>>> I have other problems with these scaled weights.
>>> First, if they are all you have, it is difficult to  identify  weights
>>> that  are too  small. (Ken Brewer, Combined Survey Sampling Inference,
>>> Wiley, p. 133).
>>> Second, with these scaled weights one cannot recover the original ones
>>> without information on the total, and the information is not always
>>> available. In fact, for some samples, the population total isn't known
>>> and the only estimate is based on the original probability weights.
>>> Third, I wonder about the accuracy of the scaled weights.  If n is
>>> moderate and  the sampling fraction is small, most of the significant
>>> figures could be far to the right of the decimal place.
>>> Finally, these weights just lead to confusion on the part of people
>>> who were not in on their construction. The original poster was
>>> confused on this occasion, and I was confused on another last year.
>>> Steve
>>> On Thu, Apr 29, 2010 at 5:47 PM, Stas Kolenikov <> wrote:
>>>> On Thu, Apr 29, 2010 at 3:03 PM, Michael I. Lichter
>>>> <> wrote:
>>>>> The scale of the weights (what they sum to) doesn't tell you whether or not
>>>>> they are pweights.
>>>> That's not quite right. Properly scaled probability weights should sum
>>>> up to the population size. This however is only relevant when you
>>>> estimate -total-s. If you run pretty much any other analysis (means,
>>>> ratios, proportions, any sort of regressions), then the scale of the
>>>> weights cancels out. I would grind my teeth at the pweights that are
>>>> scaled to the sample size, and maybe make some mental comments about
>>>> the data provider, but won't be bothered very much by this nuisance.
>>>> The scaling of the weights begins to matter again with multilevel
>>>> data, in which the scaling is known to affect the accuracy of the
>>>> variance component estimates.
>>>> --
>>> --
>>> Steven Samuels
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> Voice: 845-246-0774
>>> Fax: 206-202-4783
>>> *
>>> *   For searches and help try:
>>> *
>>> *
>>> *
> *
> *   For searches and help try:
> *
> *
> *

Steven Samuels
18 Cantine's Island
Saugerties NY 12477
Voice: 845-246-0774
Fax:    206-202-4783

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index