Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Is pweight the right weight for me and how to specify my weight vector

From   Steve Samuels <>
Subject   Re: st: Is pweight the right weight for me and how to specify my weight vector
Date   Sun, 29 Dec 2013 10:42:53 -0500

I suggest a multi-level mixed-effects model with Stata's -mixed- or
-meglm- commands: individual installation will be nested within zip
code, and zip code will be possibly nested within city or other larger
unit. This is the only analysis that will allow proper assessment of
both individual installation and zip-code level predictors. If between
zip-code variation is large, relative to within-zip code variation, the
multilevel models will automatically downweight zip codes with more
installations. I don't see a place for other weights in this analysis.

Steve  Samuels
18 Cantine's Island
Saugerties NY USA

> On Dec 27, 2013, at 7:41 PM, Jesse Burkhardt wrote:
> Thanks for the responses.  I'll try and be more clear. My data is as
> follows:
> The dependent variable is the cost of a solar panel installation in a given
> zip code. The sampling process is assumed to be 100% of the population of
> installed solar panels. We believe this to be a fairly reasonable
> assumption. The independent variables are characteristics of the installed
> solar panels at the zip code level, zip code level census data, and a city
> based department of energy "score". This "score" variable is our primary
> variable of interest.
> The census data are obviously mean values for each zip code but the
> characteristics and the cost data are not means so I wasn't sure I wanted
> to use aweights since aweights seem to be for mean level data only. In
> addition, the score data at the city level causes problems because all zip
> codes within a given city are assigned the same score value and there is
> probably selection into the department of energy scoring program at the
> city level. For now I am ignoring the selection problem.
> On the other hand, since we assume we do not have a sampling bias for the
> installations, in that we have 100% of the population, then I'm not sure
> weights are really necessary.
> Here is the troubling question: I have cities with only 1 or 2
> installations and cities with over 10,000 installations. My worry is that
> the cities with 10,000 installations will drive the regression results for
> the coefficient on "score."  I would like to add weight to cities with only
> a few installations and down weight cities with thousands of observations.
> Which weighting scheme would work best for this and is this appropriate to
> do given the structure of the data? Thanks again.
> Jesse
> On Thu, Dec 26, 2013 at 9:23 PM, Richard Williams
> <> wrote:
> I agree that the question is unclear. I wonder if zip code areas are the
> intended unit of analysis? If so aweights might be appropriate. See, for
> example,
> <>
> At 11:37 PM 12/26/2013, Steve Samuels wrote:
> Your description so far says nothing about a sampling process of any
> kind, so your designation of the weights as "sampling weights" or
> "probability" weights (pweights) is premature and probably incorrect.
> We would need more detail on the population, the sampling process if
> any, the sample, and the purpose of your analysis. Have you only zip
> code level data, data on individuals, or both?
> Steve
> Dear Members.
> I have data with multiple observations per zip code.  I count the
> number of observations per zip code and use that number as the
> sampling weight. So I have a vector called weights, which is equal to
> the number of observations per zip code. When I run a regression and
> use the [pweight=weights] option, does stata invert each element of
> the vector or am I supposed to do take the inverse manually?
> Secondly, can someone provide some intuition for when I use pweight as
> stated above?  Is the result a regression in which each zip code is
> weighted equally?  The worry is that without this weight command, a
> zip code with 10,000 observations will drive regression results more
> than a zip code with 1 observation.  I'm wondering if using pweight
> will down weight the zip code with 10,000 observations and upweight
> zip codes with fewer observations. Is there a better weighting scheme
> to use in this situation? Thanks for any advice.
> Jesse
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index