[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: pweight or fweight?
Steven Samuels <firstname.lastname@example.org>
Re: st: pweight or fweight?
Mon, 8 Sep 2008 12:21:52 -0400
On Sep 8, 2008, at 5:16 AM, Andrea Bennett wrote:
I'm getting a little confused with the weight options. I've constructed the following weight: A / B=weight, with A==share in true population and B==share in sample. These weights refer to U.S. states while the observations are single individuals.
Which weight option should I use in the regress command? I think it should be the -pweight- but since I've never worked with weights before, I thought I ask!
I should give Andrea more detail.
According to Stata's help:
1. fweights, or frequency weights, are weights that indicate the number of duplicated observations.
2. pweights, or sampling weights, are weights that denote the inverse of the probability that the observation is included
because of the sampling design
Now, Andrea's weights are certainly not frequency weights. Are they pweights? They do not meet the technical definition, but they can function as pweights:
Take the following example:
Suppose Andrea has a sample of 1000 individuals, 5 from Alabama and 8 from California The "shares" for Albama and California are B= 5% and 8%, respectively.
The US Population is approximately 300,000,000 people; Alabama has about 4.6 x 10^6 and California has about 36.6 x 10^6, for percentages of about A= 1.53% and 12.2%.
Andrea's weights for Alabama and California are wt_1 = A/B = 0.306 Alabama. wt_1= 1.525 California.
Andrea says nothing about how the sample was drawn. But in an informal sense, the five sample Alabamans represent 4.6 x 10^6 true Alabamans. Therefore each represents wt_2 = 920,000 Alabamans. Similarly, each of the eight Californians represents wt_2 = 4.58 X 10^6 Californians. These look like probability weights, but are not: for pweights refer to sampling probabilities and Andrea says nothing about sampling. However the essence of a "weight" is the number of population members represented by each sampled unit; in this sense, these are post-stratification weights.
Andrea's weights look nothing like these. But, consider the ratios of the weights two kinds of weights:
Alabama: wt_1/wt_2 = 1/(3,000,000) California: wt_2/wt_1 = 1/ (3,000,000)
In Stata's survey commands, only estimation of population totals require absolute weights--the absolute number of people represented by each sample member. Estimation of means and regression coefficients require only weights that are proportional to the absolute weights. Andrea's weights are proportional, as the example shows. In the general case, the constant of proportionality is n/N, where n is the sample size and N is the population size.
So, Andrea can use the pweight specification.
However I have doubts about the utility of this whole effort. Perhaps Andrea will tell us more about the sample and the study and about the reasons for choosing these particular weights. Why weight for state population, for example, but not for age, gender, or other characteristics.
* For searches and help try: