Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Is pweight the right weight for me and how to specify my weight vector
From
Jesse Burkhardt <[email protected]>
To
[email protected]
Subject
Re: st: Is pweight the right weight for me and how to specify my weight vector
Date
Fri, 27 Dec 2013 16:41:15 -0800
Thanks for the responses. I'll try and be more clear. My data is as
follows:
The dependent variable is the cost of a solar panel installation in a given
zip code. The sampling process is assumed to be 100% of the population of
installed solar panels. We believe this to be a fairly reasonable
assumption. The independent variables are characteristics of the installed
solar panels at the zip code level, zip code level census data, and a city
based department of energy "score". This "score" variable is our primary
variable of interest.
The census data are obviously mean values for each zip code but the
characteristics and the cost data are not means so I wasn't sure I wanted
to use aweights since aweights seem to be for mean level data only. In
addition, the score data at the city level causes problems because all zip
codes within a given city are assigned the same score value and there is
probably selection into the department of energy scoring program at the
city level. For now I am ignoring the selection problem.
On the other hand, since we assume we do not have a sampling bias for the
installations, in that we have 100% of the population, then I'm not sure
weights are really necessary.
Here is the troubling question: I have cities with only 1 or 2
installations and cities with over 10,000 installations. My worry is that
the cities with 10,000 installations will drive the regression results for
the coefficient on "score." I would like to add weight to cities with only
a few installations and down weight cities with thousands of observations.
Which weighting scheme would work best for this and is this appropriate to
do given the structure of the data? Thanks again.
Jesse
On Thu, Dec 26, 2013 at 9:23 PM, Richard Williams
<[email protected]> wrote:
> I agree that the question is unclear. I wonder if zip code areas are the
> intended unit of analysis? If so aweights might be appropriate. See, for
> example,
>
> <http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/sample_surveys/weight_syntax>http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/sample_surveys/weight_syntax
>
>
>
> At 11:37 PM 12/26/2013, Steve Samuels wrote:
>
>> Your description so far says nothing about a sampling process of any
>> kind, so your designation of the weights as "sampling weights" or
>> "probability" weights (pweights) is premature and probably incorrect.
>>
>> We would need more detail on the population, the sampling process if
>> any, the sample, and the purpose of your analysis. Have you only zip
>> code level data, data on individuals, or both?
>>
>> Steve
>>
>>
>> Dear Members.
>> I have data with multiple observations per zip code. I count the
>> number of observations per zip code and use that number as the
>> sampling weight. So I have a vector called weights, which is equal to
>> the number of observations per zip code. When I run a regression and
>> use the [pweight=weights] option, does stata invert each element of
>> the vector or am I supposed to do take the inverse manually?
>>
>> Secondly, can someone provide some intuition for when I use pweight as
>> stated above? Is the result a regression in which each zip code is
>> weighted equally? The worry is that without this weight command, a
>> zip code with 10,000 observations will drive regression results more
>> than a zip code with 1 observation. I'm wondering if using pweight
>> will down weight the zip code with 10,000 observations and upweight
>> zip codes with fewer observations. Is there a better weighting scheme
>> to use in this situation? Thanks for any advice.
>> Jesse
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME: (574)289-5227
> EMAIL: [email protected]
> WWW: http://www.nd.edu/~rwilliam
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/