[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: SV: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights |

Date |
Mon, 8 Dec 2008 15:31:20 -0500 |

Kristian: I was vague and I apologize: I mixed up the initial weights.

Type: "svyset _n [pweight=weight1x], strata(age_grp) as before.

-Steve On Dec 8, 2008, at 2:10 PM, Kristian Wraae wrote:

Ok. I'm a bit lost here. I really don't understand all the steps(especiallystep 2) but I'll try to do them anyway. *1: Like before: The 600: age_grp n_age_grp_s pct_age_grp_s 1 38 6.33 2 47 7.83 3 41 6.83 4 41 6.83 5 44 7.33 6 38 6.33 7 44 7.33 8 48 8.00 9 43 7.17 10 41 6.83 11 42 7.00 12 35 5.83 13 39 6.50 14 33 5.50 15 26 4.33 Total 600 And the 4975: age_grp n_age_grp pct_age_grp Cum. 1 450 9.05 9.05 2 438 8.80 17.85 3 395 7.94 25.79 4 375 7.54 33.33 5 376 7.56 40.88 6 370 7.44 48.32 7 344 6.91 55.24 8 315 6.33 61.57 9 306 6.15 67.72 10 299 6.01 73.73 11 275 5.53 79.26 12 271 5.45 84.70 13 263 5.29 89.99 14 241 4.84 94.83 15 257 5.17 100.00 Total 4975 So weight1 is defined as: gen weight1=. replace weight1 = 450 / 38 if age_grp == 1 replace weight1 = 438 / 47 if age_grp == 2 replace weight1 = 395 / 41 if age_grp == 3 replace weight1 = 375 / 41 if age_grp == 4 replace weight1 = 376 / 44 if age_grp == 5 replace weight1 = 370 / 38 if age_grp == 6 replace weight1 = 344 / 44 if age_grp == 7 replace weight1 = 315 / 48 if age_grp == 8 replace weight1 = 306 / 43 if age_grp == 9 replace weight1 = 299 / 41 if age_grp == 10 replace weight1 = 275 / 42 if age_grp == 11 replace weight1 = 271 / 35 if age_grp == 12 replace weight1 = 263 / 39 if age_grp == 13 replace weight1 = 241 / 33 if age_grp == 14 replace weight1 = 257 / 26 if age_grp == 15 *2: ????? How do I estimate *3: *4:Now I generate a variable called sample which is 1 for each of the600 and 0for the rest of the 3743. .tab sample sample Freq. Percent Cum. 0 3,143 83.97 83.97 1 600 16.03 100.00I now generate the probability of inclusion using just age andgeography tomake things simple: xi: logistic sample i.age_grp i.geo_grp Predict p_r *5: gen weight2 = weight1 * (1/p_r) *6: Now I generate the totals for age and geography: *age gen pct_agex = . replace pct_agex = 450 / 4975 if age_grp == 1 replace pct_agex = 438 / 4975 if age_grp == 2 replace pct_agex = 395 / 4975 if age_grp == 3 replace pct_agex = 375 / 4975 if age_grp == 4 replace pct_agex = 376 / 4975 if age_grp == 5 replace pct_agex = 370 / 4975 if age_grp == 6 replace pct_agex = 344 / 4975 if age_grp == 7 replace pct_agex = 315 / 4975 if age_grp == 8 replace pct_agex = 306 / 4975 if age_grp == 9 replace pct_agex = 299 / 4975 if age_grp == 10 replace pct_agex = 275 / 4975 if age_grp == 11 replace pct_agex = 271 / 4975 if age_grp == 12 replace pct_agex = 263 / 4975 if age_grp == 13 replace pct_agex = 241 / 4975 if age_grp == 14 replace pct_agex = 257 / 4975 if age_grp == 15 gen tot_agex = round(pct_agex * 10000) replace tot_agex = tot_agex - 1 if agex ==1 *Geography gen pct_geo =. replace pct_geo = 2726 / 4975 if geo_gr==1 replace pct_geo = 2249 / 4975 if geo_gr==2 gen tot_geo = round(pct_geo * 10000) * Now I rake weight2 back to the age categories & geographics keep if sample==1 survwgt rake weight2, /// by(age_grp geo_grp) /// totvars(tot_agex tot_geo) /// gen(weight3) *7Here I make new variables for tot_agex and tot_grp from data fromthe DanishCensus (_DC) like this: *age gen pct_agex = .replace pct_agex_DC = (DC population total in age_grp==1) / (DCpopulationtotal) if age_grp == 1 . . . .replace pct_agex_DC = (DC population total in age_grp==15) / (DCpopulationtotal) if age_grp == 15 gen tot_agex_DC = round(pct_agex_DC * 10000) And the same for tot_geo_DC Then I use the rake again survwgt rake weight3, /// by(age_grp geo_grp) /// totvars(tot_agex_DC tot_geo_DC) /// gen(weight4) svyset [pweight=weight4], strata(agex) So to estimate ed in the general populaion I would do: svymean ed Is it correct? Steven if you give me your personal details I'll include you in the acknowledgements of the paper if you'd like. Best regards Kristian -----Oprindelig meddelelse----- Fra: owner-statalist@hsphsun2.harvard.edu[mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af StevenSamuelsSendt: Monday, December 08, 2008 6:13 PM Til: statalist@hsphsun2.harvard.eduEmne: Re: SV: S: SV: st: Survey - raking - calibration - poststratification- calculating weights On Dec 8, 2008, at 2:55 AM, Kristian Wraae wrote:Ok, thanks. Now I understand how to do the raking procedure. I have one question though. Since I have a two step inclusion procedure wouldn't it be more accurate to rake in two steps. Example: I know the distribution of medication amongst the 3745 men. But the 3745 men differs from the 4975 men by being slightly younger and we know that the older you get the more medicin do you get. That also goes for physical activity and smoking. So if I calculate the expected prevalences amongst the 4975 (in order to rake the 600) from the 3750 I risk making a mistake (underestimating the prevalences in the baclground population). I guess should be calculating the all prevalences from the 4975, but I don't those data. So wouldn't it be more correct to: 1. Rake the 3750 so they match the 4975 on age and geography.2. Calculate all the expected prevalences on age, medication,smoking,physical activity ect from the now raked 3750 (as we would expect them to be had we had a 100% response rate). 3. Use these prevalences to rake the 600 as you showed me?Your concern is a good one, Kristian. However, the solution you propose is ad-hoc with no real theoretical justification. I've tried some complicated raking in the past, but I have never seen a reference to the method you propose. You have much questionnaire information on too many informative variables; raking can use only a small part of it. There is a standard approach to this problem: model the probability of participating in the phone interview. I suggest you consult the text "Statistical Analysis with Missing Data" by Little & Rubin, especially Chapters 3 & 13. In the parlance of that book, you must assume that data are "Missing at Random". This means that the probability of having a phone interview depends completely on characteristics known from the mail questionnaire or the census. Here are the steps: 1. Estimate weight1 = N_i/n_i as before for the 15 age groups. 2. You can use this weight on the second phase sample of 3,750 to estimate various properties of the population known such as proportions in categories of medication, physical activity smoking. These may be of interest in themselves. 3. Instead of raking, use -logistic- or -logit- (not the survey versions) on the 3,750 men to predict who participated in the telephone interview. Consider as covariates: age, geography, medication, physical activity, smoking and any others that might be of use. 4. Generate the predicted probability of participating in the telephone interview. Call this p_r. Your goal is to get a good prediction, so compute ROC curves, if possible. (I don't recall if Stata 8 has the -lroc- command.) 5. For the 600 men in the telephone survey, compute: weight2 = (weight1) x (1/p_r). 6. Rake weight2 back to the age categories & geographic categories of the 5,000 men. Call the result "weight3". 7. Finally rake weight3 to the Danish Census age/geographical breakdowns: Call it "weight4". 7. Use this as your final analysis weight for -svymean-. You are a long way from the simplicity of Stas's earlier suggestion to use "weight1" on your data. Standard errors that you compute will be under-estimated, because they do not account for the uncertainty in the estimating "weight3", and you must state this in your report. If you wish to compute the proper standard errors, you must, I think, bootstrap the process starting no later than Step 3. This is the price for using the complex sampling design. -Steve * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**SV: SV: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights***From:*"Kristian Wraae" <Kristian_Wraae@vip.cybercity.dk>

**References**:**SV: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights***From:*"Kristian Wraae" <Kristian_Wraae@vip.cybercity.dk>

- Prev by Date:
**Re: st: Bootstrap: Which standard errors to use?** - Next by Date:
**SV: SV: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights** - Previous by thread:
**SV: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights** - Next by thread:
**SV: SV: SV: S: SV: st: Survey - raking - calibration - post stratification - calculating weights** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |