[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: using post stratification weights
Stas Kolenikov <email@example.com>
Re: st: using post stratification weights
Tue, 14 Feb 2012 12:38:01 -0500
On Tue, Feb 14, 2012 at 11:37 AM, Afif Naeem <firstname.lastname@example.org> wrote:
>> > The survey code-book does not provide much information with regards to design-stratification. But they do tell me that they used some combination of random digit dialing (RDD) sampling and address-based sampling (ABS) methodology. I have a feeling that they did not used design-stratification for sampling purposes.
>> If they obtained a part of the sample from RDD frame, and another part
>> from ABS frame, then these are two independent strata, and should be
>> accounted as such. You'd have to continue clarifying this.
> I guess the sampling in completely random with RDD and ABS. In that case, do I still need to define/take care of the two strata arising from RDD and ABS?
If you could do that, that would be more appropriate. If they don't
have this information, you will probably have standard errors that are
a bit off, although it is hard to say if they will go up or down.
>> > My main concern is the low response/completion rate of the survey i.e. 62.5% of respondents do actually complete the survey. Would using the post-stratification (i.e. Raking) weights without mentioning any post-strata correct for any bias that may arise due to low response rate? And where/how would the variable used (mentioned below) used in the Raking process would come into play? (assuming if the do come into play)
>> If the response is MAR with the variables determining non-response
>> used in the non-response model that led to the post-stratification
>> adjustments (i.e., age, gender, etc.), then you will be fine. But this
>> is a strong assumption to make.
> Can you please elucidate on this point further more. Plus what does MAR stand for?
Missing at random. You need to take a look at missing/incomplete data
literature, such as Little & Rubin (2002), Statistical Analysis with
Missing Data, or Schafer (1997) Analysis of Incomplete Multivariate
>> > Moreover, the post-stratification weight variable provided in the data set ranges from a value of 0.13 to 5.6, with a mean value of 1.000075. As far as I understand, pweight is the inverse of sampling fraction and hence should be greater than (or equal to) 1. Do I need to worry about it or STATA will adjust for it?
>> Stata will not make any guesses; if you specified these weights, Stata
>> will use them, and does not care whether they sum up to the total
>> population size (as they should) or to the sample size (which is a
>> shortcut for SAS or SPSS that can't do things otherwise). It is up to
>> the analyst to specify the weights appropriately and interpret the
>> results. If you don't need to estimate the population totals (total
>> income; total # of events; etc.), then you can get along with these
> So how should I specify the weights appropriately? Do I have to modify the weights given in the data set?
As I said, it depends on the type of the analysis you need to
undertake, and on the information available to you. If you know the
population size, you can scale the weights so that they add up to this
population size, that would be a safe bet.
>> > Lastly, how precise it is to use post-stratification weights in Bivariate Logit Model. My results completely flip-over and loose statistical significance when I use weights in the model using survey commands. Signs and statistical significance can not be justified on the basis of any (economic) theory. However, when I dont use weights, the results come out as expected. I wonder if I am doing something wrong here. Can the use of weights change the parameter estimates and average marginal effects to such an extent?
>> -svy postestimation- command provide design effects, i.e., the ratio
>> of variances of design-based estimates vs. SRS estimates. I wouldn't
>> be surprised if your poststratification has actually increased the
>> variances quite a bit. Unfortunately, that's the price you have to pay
>> to get design-consistent estimates.
> Increase in variance of the estimates makes sense. But 6 out of 14 independent variables change sign when I use weights in my Logit model. Is hat something unheard of and am I doing something wrong here?
Your weights appear to have quite notable spread in them, so I won't
be surprised to see notable changes in the coefficients.
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
* For searches and help try: