Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: using post stratification weights

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: using post stratification weights
Date	Sun, 12 Feb 2012 18:00:19 -0500

You are mixing up post-stratification (actually, sample raking) with design-stratification; for the difference you should consult a good sampling text (e.g. Sharon Lohr, Sampling: Design and Analysis, 2009).

You will need to determine from the research firm whether they used any design strata in the selection of the sample.  If they cannot supply the information, then you can conservatively assume there were none (conservatively, meaning that standard errors might be larger than need be). Also, it appears, but you must confirm, that there was no clustering of individuals.

In any case, the -svyset- statement will be:

********************************
svyset _n [pw = supplied_weight]
******************************** 

where you should substitute the name of the supplied weight variable after the equal sign.

You can then commands such as -svy: mean-, -svy: tab-, and -svy: logistic- for your analysis.

On Feb 12, 2012, at 11:41 AM, Afif Naeem wrote:

Thanks Steven for you reply. Actually, correctly specifying the strata (or any other first stage sampling effect) is what I am mostly confused about. Let me explain you the survey design first.

The survey was conducted by an internet-based survey research firm that has developed a fully representative pool of individuals. For our specific study, a representative sample of 857 Ohio state adults (18 and over) was selected. These individuals were invited to take part in the online survey. Out of 859 individuals, only 537 completely respond and complete the survey. This means only 62.5% of respondents do actually complete the survey.

The survey research firm also provides in the data set individual specific weights, generated through a post-stratification process (i.e. Raking) to adjust for any survey non-response and also any non-coverage due to the study-specific sample design. Demographic and geographic distributions for the Ohio population aged 18+ from the most recent Current Population Survey (CPS) are used as benchmarks in this adjustment.

The following benchmark distributions are utilized for this post-stratification adjustment:

  Gender (Male, Female)
  Age (18-29, 30-44, 45-59, 60+)
  Race/Hispanic ethnicity
  Education (Less than High School, High School, Some college, Bachelor and beyond)
  Metropolitan Area (Yes, No)
  Internet Access (Yes, No)

Comparable distributions are calculated using all completed cases from the field data. Since study sample sizes are typically too small to accommodate a complete cross-tabulation of all the survey variables with the benchmark variables, an iterative proportional fitting is used for the post-stratification weighting adjustment. This procedure adjusts the sample data back to the selected benchmark proportions. Through an iterative convergence process, the weighted sample data are optimally fitted to the marginal distributions.
After this final post-stratification adjustment, the distribution of the calculated weights are examined to identify and trim  outliers at the extreme upper and lower tails of the weight distribution. The post-stratified and trimmed weights are then scaled to the sum of the total sample size of qualified respondents.

Given this survey design, and method of deriving weights, I am not sure how to define strata or any thing else. I guess the main source of sample-selection bias in the data set arise from the fact that only 62.5% of respondents do actually complete the survey. The un-weighted summary statistics show some deviation from that of the state of Ohio. I want to properly weight the sample to make it more comparable to the general population of state oh Ohio.

My main aim is to use these weights in my Binary Logit model, so that the inferences I draw are applicable to the general population of Ohio. I really hope someone can provide me some guidance here.

Best,
Afif

----------------------------------------
> Subject: Re: st: using post stratification weights
> From: [email protected]
> Date: Sat, 11 Feb 2012 19:43:18 -0500
> To: [email protected]
> 
> I should have added: and use Stata's survey commands. Note that weighting alone is not sufficient for valid inference. You must properly specify the first stage of sampling design in the -svyset- statement. Otherwise standard errors will be wrong.
> 
> SS
> 
> 
> Specify the adjusted weights in the -svyset- statement by [pw = adjusted_wt]
> 
> Steve
> [email protected]
> On Feb 11, 2012, at 12:24 PM, Afif Naeem wrote:
> 
> Thanks Stas for your response. I should have been more clear in my first email. What I meant is that I have weights in the data set generated through iterative proportional fitting, as described by the quote from the sruvey hand-book itself below.
> 
> What I can not figure out is how to utilize these weights to find out the summary statistics of variables in the data set. Also, is the a way to run simple Logit model where this weight variable is used to weigh individuals differently?
> 
> Afif
> 
> 
> ----------------------------------------
>> Date: Sat, 11 Feb 2012 10:24:22 -0500
>> Subject: Re: st: using post stratification weights
>> From: [email protected]
>> To: [email protected]
>> 
>> Afif,
>> 
>> I am not sure what your question is. "Help me out" is too broad, and I
>> don't know how many people on this list are in the business of mind
>> reading.
>> 
>> We are probably talking about different types of weight adjustments
>> here. Afif quoted from the survey manual he's been using:
>> 
>>> The post-stratification weights are generated through iterative proportional fitting. Quote from the survey hand-book itself is below:
>> 
>> "The following benchmark distributions are utilized for this
>> post-stratification adjustment:
>> Gender (Male, Female)
>> Age (18-29, 30-44, 45-59, 60+)
>> Race/Hispanic ethnicity
>> Education category
>> Metropolitan Area (Yes, No)
>> Internet Access (Yes, No)
>> 
>> Comparable distributions are calculated using all completed cases from
>> the field data. Since study sample sizes are typically too small to
>> accommodate a complete cross-tabulation of all the survey variables
>> with the benchmark variables, an iterative proportional fitting is
>> used for the post-stratification weighting adjustment. This procedure
>> adjusts the sample data back to the selected benchmark proportions.
>> Through an iterative convergence process, the weighted sample data
>> are optimally fitted to the marginal distributions."
>> 
>> This is a different procedure than post-stratification in Stata terms.
>> Stata relies on the post-strata being mutually exclusive, and
>> obviously the above categories aren't. What your quote suggests is a
>> raking procedure, where the weights are adjusted along each of the
>> dimensions/categorical variables, so that the current variable is made
>> to agree with the known distribution perfectly, moving then to the
>> next margin, etc., until some sort of convergence is achieved.
>> Official Stata does not do this, although you should be able to find
>> third party programs written for this purpose. I use -maxentropy-
>> (which is cumbersome to use, but does the job quickly).
>> 
>> Post-stratification adjustments call for special variance estimation
>> methods. That's why Stata has post-stratification as an additional
>> option in -svyset-; without these adjustments, your standard errors
>> may be some 20-30% too small on descriptive statistics correlated with
>> the calibration variables. These adjustments are relatively easy to
>> implement with post-stratification over mutually exclusive strata (and
>> that's done in Stata), but are somewhat harder with multivariate
>> marginal adjustments. You won't be able to do these adjustments unless
>> you have both the original sampling weight and the post-stratified
>> weights, as well as the variables used for calibration (or,
>> equivalently, the population totals towards which the adjustment was
>> made).
>> 
>> A typo correction in Cam's literature suggestions:
>> 
>> Holt, D., & Smith, T.M.F. (1979). Post stratiﬁcation. Journal of the
>> Royal Statistical Society, Series A, 142, 33–46.
>> 
>> 
>> --
>> Stas Kolenikov, also found at http://stas.kolenikov.name
>> Small print: I use this email account for mailing lists only.
>> 
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: using post stratification weights
  - From: Afif Naeem <[email protected]>

References:
- st: using post stratification weights
  - From: Afif Naeem <[email protected]>
- RE: st: using post stratification weights
  - From: Cameron McIntosh <[email protected]>
- Re: st: using post stratification weights
  - From: Stas Kolenikov <[email protected]>
- RE: st: using post stratification weights
  - From: Afif Naeem <[email protected]>
- Re: st: using post stratification weights
  - From: Steve Samuels <[email protected]>
- RE: st: using post stratification weights
  - From: Afif Naeem <[email protected]>

Prev by Date: [no subject]
Next by Date: st: gologit2 and mlogit coefficients do not agree
Previous by thread: Re: st: using post stratification weights
Next by thread: RE: st: using post stratification weights
Index(es):
- Date
- Thread