Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: pweight question |

Date |
Thu, 29 Apr 2010 21:33:56 -0400 |

No--they should be scaled to add to the size of the sampled population, in this case the "several economic zones".The sum of the original probability weights in a multi-stage sample will not necessarily add to the known population size. But scaling them up to the known size is a good idea. I'm not surprised the data came from SPSS file. Scaling the weights to sum to n was the only way to responsibly use the WEIGHT statement in SPSS. But since SPSS outside the COMPLEX SAMPLES package couldn't handle the stratum and cluster information from complex samples, standard errors and hypothesis tests were always wrong. Steve On Thu, Apr 29, 2010 at 9:01 PM, Steven Archambault <archstevej@gmail.com> wrote: > "The scale of the weights (what they sum to) doesn't tell you whether > or not they are pweights. Scaling the variables to sum to the size of > the sample is something you do when you expect to use a package (or > command) that, like SPSS until recently, only accepts fweights." > > Okay, now that I read this, the scaling down to the sample size does > make sense. The data was originally in an SPSS format. So, > essentially, scaling up would give the weights in terms of the > population of the country. Correct? > > Thanks, > Steve > > > > > On Thu, Apr 29, 2010 at 6:52 PM, Steven Archambault > <archstevej@gmail.com> wrote: >> Thanks for the responses thus far. I cannot say it is all clear to me >> now, but I am getting there. >> >> As for the strata and clustering, this is data that was to be taken as >> a representation of the population in several different "economic >> zones". The observations are taken from different villages in each >> zone. Actually, observations from each village have the exact same >> weight. I also know the population and area of the individual >> villages. I am assuming the "probability that an observation is in the >> sample" is based on the population density of that village or economic >> region. But, that isn't clear. Perhaps I could come up with my own >> weights retrospectively? >> >> I am also analyzing this for multilevel effects, using gllamm. So, I >> do expect the weights to matter. >> >> Any further guidance would be very helpful! >> >> Thanks, >> Steve >> >> On Thu, Apr 29, 2010 at 6:37 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>> I have other problems with these scaled weights. >>> >>> First, if they are all you have, it is difficult to identify weights >>> that are too small. (Ken Brewer, Combined Survey Sampling Inference, >>> Wiley, p. 133). >>> >>> Second, with these scaled weights one cannot recover the original ones >>> without information on the total, and the information is not always >>> available. In fact, for some samples, the population total isn't known >>> and the only estimate is based on the original probability weights. >>> >>> Third, I wonder about the accuracy of the scaled weights. If n is >>> moderate and the sampling fraction is small, most of the significant >>> figures could be far to the right of the decimal place. >>> >>> Finally, these weights just lead to confusion on the part of people >>> who were not in on their construction. The original poster was >>> confused on this occasion, and I was confused on another last year. >>> >>> Steve >>> >>> On Thu, Apr 29, 2010 at 5:47 PM, Stas Kolenikov <skolenik@gmail.com> wrote: >>>> On Thu, Apr 29, 2010 at 3:03 PM, Michael I. Lichter >>>> <mlichter@buffalo.edu> wrote: >>>>> The scale of the weights (what they sum to) doesn't tell you whether or not >>>>> they are pweights. >>>> >>>> That's not quite right. Properly scaled probability weights should sum >>>> up to the population size. This however is only relevant when you >>>> estimate -total-s. If you run pretty much any other analysis (means, >>>> ratios, proportions, any sort of regressions), then the scale of the >>>> weights cancels out. I would grind my teeth at the pweights that are >>>> scaled to the sample size, and maybe make some mental comments about >>>> the data provider, but won't be bothered very much by this nuisance. >>>> >>>> The scaling of the weights begins to matter again with multilevel >>>> data, in which the scaling is known to affect the accuracy of the >>>> variance component estimates. >>>> >>>> -- >>> >>> -- >>> Steven Samuels >>> sjsamuels@gmail.com >>> 18 Cantine's Island >>> Saugerties NY 12477 >>> USA >>> Voice: 845-246-0774 >>> Fax: 206-202-4783 >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: pweight question***From:*Steven Archambault <archstevej@gmail.com>

**Re: st: pweight question***From:*"Michael I. Lichter" <mlichter@buffalo.edu>

**Re: st: pweight question***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: pweight question***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: pweight question***From:*Steven Archambault <archstevej@gmail.com>

**Re: st: pweight question***From:*Steven Archambault <archstevej@gmail.com>

- Prev by Date:
**Re: st: pweight question** - Next by Date:
**Re: st: pweight question** - Previous by thread:
**Re: st: pweight question** - Next by thread:
**Re: st: pweight question** - Index(es):