Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: using post stratification weights

 From Afif Naeem To Subject RE: st: using post stratification weights Date Mon, 13 Feb 2012 11:36:35 -0500

```I appreciate the response from you guys.

I looked on the difference between post-stratification (raking in my case) and design-stratification, and now I see the difference between the two.

The survey code-book does not provide much information with regards to design-stratification. But they do tell me that they used some combination of random digit dialing (RDD) sampling and address-based sampling (ABS) methodology. I have a feeling that they did not used design-stratification for sampling purposes.

My main concern is the low response/completion rate of the survey i.e. 62.5% of respondents do actually complete the survey. Would using the post-stratification (i.e. Raking) weights without mentioning any post-strata correct for any bias that may arise due to low response rate? And where/how would the variable used (mentioned below) used in the Raking process would come into play? (assuming if the do come into play)

Moreover, the post-stratification weight variable provided in the data set ranges from a value of 0.13 to 5.6, with a mean value of 1.000075. As far as I understand, pweight is the inverse of sampling fraction and hence should be greater than (or equal to) 1. Do I need to worry about it or STATA will adjust for it?

Lastly, how precise it is to use post-stratification weights in Bivariate Logit Model. My results completely flip-over and loose statistical significance when I use weights in the model using survey commands. Signs and statistical significance can not be justified on the basis of any (economic) theory. However, when I dont use weights, the results come out as expected. I wonder if I am doing something wrong here. Can the use of weights change the parameter estimates and average marginal effects to such an extent?

Regards
Afif

----------------------------------------
> Subject: Re: st: using post stratification weights
> From: sjsamuels@gmail.com
> Date: Sun, 12 Feb 2012 18:00:19 -0500
> To: statalist@hsphsun2.harvard.edu
>
>
> You are mixing up post-stratification (actually, sample raking) with design-stratification; for the difference you should consult a good sampling text (e.g. Sharon Lohr, Sampling: Design and Analysis, 2009).
>
> You will need to determine from the research firm whether they used any design strata in the selection of the sample. If they cannot supply the information, then you can conservatively assume there were none (conservatively, meaning that standard errors might be larger than need be). Also, it appears, but you must confirm, that there was no clustering of individuals.
>
>
> In any case, the -svyset- statement will be:
>
> ********************************
> svyset _n [pw = supplied_weight]
> ********************************
>
> where you should substitute the name of the supplied weight variable after the equal sign.
>
> You can then commands such as -svy: mean-, -svy: tab-, and -svy: logistic- for your analysis.
>
>
>
> On Feb 12, 2012, at 11:41 AM, Afif Naeem wrote:
>
> Thanks Steven for you reply. Actually, correctly specifying the strata (or any other first stage sampling effect) is what I am mostly confused about. Let me explain you the survey design first.
>
> The survey was conducted by an internet-based survey research firm that has developed a fully representative pool of individuals. For our specific study, a representative sample of 857 Ohio state adults (18 and over) was selected. These individuals were invited to take part in the online survey. Out of 859 individuals, only 537 completely respond and complete the survey. This means only 62.5% of respondents do actually complete the survey.
>
> The survey research firm also provides in the data set individual specific weights, generated through a post-stratification process (i.e. Raking) to adjust for any survey non-response and also any non-coverage due to the study-specific sample design. Demographic and geographic distributions for the Ohio population aged 18+ from the most recent Current Population Survey (CPS) are used as benchmarks in this adjustment.
>
> The following benchmark distributions are utilized for this post-stratification adjustment:
>
> Gender (Male, Female)
> Age (18-29, 30-44, 45-59, 60+)
> Race/Hispanic ethnicity
> Education (Less than High School, High School, Some college, Bachelor and beyond)
> Metropolitan Area (Yes, No)
> Internet Access (Yes, No)
>
> Comparable distributions are calculated using all completed cases from the field data. Since study sample sizes are typically too small to accommodate a complete cross-tabulation of all the survey variables with the benchmark variables, an iterative proportional fitting is used for the post-stratification weighting adjustment. This procedure adjusts the sample data back to the selected benchmark proportions. Through an iterative convergence process, the weighted sample data are optimally fitted to the marginal distributions.
> After this final post-stratification adjustment, the distribution of the calculated weights are examined to identify and trim outliers at the extreme upper and lower tails of the weight distribution. The post-stratified and trimmed weights are then scaled to the sum of the total sample size of qualified respondents.
>
>
> Given this survey design, and method of deriving weights, I am not sure how to define strata or any thing else. I guess the main source of sample-selection bias in the data set arise from the fact that only 62.5% of respondents do actually complete the survey. The un-weighted summary statistics show some deviation from that of the state of Ohio. I want to properly weight the sample to make it more comparable to the general population of state oh Ohio.
>
> My main aim is to use these weights in my Binary Logit model, so that the inferences I draw are applicable to the general population of Ohio. I really hope someone can provide me some guidance here.
>
>
> Best,
> Afif
>
>
>
>
>
> ----------------------------------------
> > Subject: Re: st: using post stratification weights
> > From: sjsamuels@gmail.com
> > Date: Sat, 11 Feb 2012 19:43:18 -0500
> > To: statalist@hsphsun2.harvard.edu
> >
> > I should have added: and use Stata's survey commands. Note that weighting alone is not sufficient for valid inference. You must properly specify the first stage of sampling design in the -svyset- statement. Otherwise standard errors will be wrong.
> >
> > SS
> >
> >
> > Specify the adjusted weights in the -svyset- statement by [pw = adjusted_wt]
> >
> > Steve
> > sjsamuels@gmail.com
> > On Feb 11, 2012, at 12:24 PM, Afif Naeem wrote:
> >
> > Thanks Stas for your response. I should have been more clear in my first email. What I meant is that I have weights in the data set generated through iterative proportional fitting, as described by the quote from the sruvey hand-book itself below.
> >
> > What I can not figure out is how to utilize these weights to find out the summary statistics of variables in the data set. Also, is the a way to run simple Logit model where this weight variable is used to weigh individuals differently?
> >
> > Afif
> >
> >
> > ----------------------------------------
> >> Date: Sat, 11 Feb 2012 10:24:22 -0500
> >> Subject: Re: st: using post stratification weights
> >> From: skolenik@gmail.com
> >> To: statalist@hsphsun2.harvard.edu
> >>
> >> Afif,
> >>
> >> I am not sure what your question is. "Help me out" is too broad, and I
> >> don't know how many people on this list are in the business of mind
> >>
> >> We are probably talking about different types of weight adjustments
> >> here. Afif quoted from the survey manual he's been using:
> >>
> >>> The post-stratification weights are generated through iterative proportional fitting. Quote from the survey hand-book itself is below:
> >>
> >> "The following benchmark distributions are utilized for this
> >> post-stratification adjustment:
> >> Gender (Male, Female)
> >> Age (18-29, 30-44, 45-59, 60+)
> >> Race/Hispanic ethnicity
> >> Education category
> >> Metropolitan Area (Yes, No)
> >> Internet Access (Yes, No)
> >>
> >> Comparable distributions are calculated using all completed cases from
> >> the field data. Since study sample sizes are typically too small to
> >> accommodate a complete cross-tabulation of all the survey variables
> >> with the benchmark variables, an iterative proportional fitting is
> >> used for the post-stratification weighting adjustment. This procedure
> >> adjusts the sample data back to the selected benchmark proportions.
> >> Through an iterative convergence process, the weighted sample data
> >> are optimally fitted to the marginal distributions."
> >>
> >> This is a different procedure than post-stratification in Stata terms.
> >> Stata relies on the post-strata being mutually exclusive, and
> >> obviously the above categories aren't. What your quote suggests is a
> >> raking procedure, where the weights are adjusted along each of the
> >> dimensions/categorical variables, so that the current variable is made
> >> to agree with the known distribution perfectly, moving then to the
> >> next margin, etc., until some sort of convergence is achieved.
> >> Official Stata does not do this, although you should be able to find
> >> third party programs written for this purpose. I use -maxentropy-
> >> (which is cumbersome to use, but does the job quickly).
> >>
> >> Post-stratification adjustments call for special variance estimation
> >> methods. That's why Stata has post-stratification as an additional
> >> option in -svyset-; without these adjustments, your standard errors
> >> may be some 20-30% too small on descriptive statistics correlated with
> >> the calibration variables. These adjustments are relatively easy to
> >> implement with post-stratification over mutually exclusive strata (and
> >> that's done in Stata), but are somewhat harder with multivariate
> >> marginal adjustments. You won't be able to do these adjustments unless
> >> you have both the original sampling weight and the post-stratified
> >> weights, as well as the variables used for calibration (or,
> >> equivalently, the population totals towards which the adjustment was
> >>
> >> A typo correction in Cam's literature suggestions:
> >>
> >> Holt, D., & Smith, T.M.F. (1979). Post stratiﬁcation. Journal of the
> >> Royal Statistical Society, Series A, 142, 33–46.
> >>
> >>
> >> --
> >> Stas Kolenikov, also found at http://stas.kolenikov.name
> >> Small print: I use this email account for mailing lists only.
> >>
> >> *
> >> * For searches and help try:
> >> * http://www.stata.com/help.cgi?search
> >> * http://www.stata.com/support/statalist/faq
> >> * http://www.ats.ucla.edu/stat/stata/
> >
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> >
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```