[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: SV: SV: st: Survey - raking - calibration - post stratification - calculating weights |

Date |
Sun, 7 Dec 2008 11:01:52 -0500 |

agex pct_agex tot_agex (= 100 x pct_agex, rounded to nearest 1) 1 8.23 823 2 10.41 1041 etc. Total 100.00 10,000

medicin pct_medicin tot_medicin 1 30.23 3023 2 45.86 4586 3 23.93 2393 Total 100.02 10002

med_smok pct_med_smok tot_medsmok 1 2 3 .. 9

**************************CODE BEGINS************************** survwgt rake weight1, /// by(age medicin smoke) /// totvars(tot_agex tot_medicin tot_smoke /// gen(weight2) ***************************CODE ENDS*************************** Or, with a combined med_smok margin. **************************CODE BEGINS************************** survwgt rake weight1, /// by(age med_smok) /// totvars(tot_agex tot_med_smok /// gen(weight2) ***************************CODE ENDS***************************

4. Finally: -svyset- your data and run Stata's survey programs: svyset _n [pweight=weight2], strata(age_gp)

-Steven On Dec 7, 2008, at 4:52 AM, Kristian Wraae wrote:

Thanks Stas & StevenWhat I would like to do is to calibrate on some of the measuresfrom thefirst questionaire.I have data on 3750 men from that first questionnaire and I wouldlike totransform my 600 man population into my 5000 man population so thatthedistribution of chronic diseases and medication is the same as wewouldexpect it to be in the 5000 man population. I know how the 5000 men differs from the 3750 men regarding age and geaography. There was a slight effect of age, but geography was notimportant for non-responders. So adjusting for age is really theonly thingneeded at this step. Then I know how the 600 differs from the 3750 men. The 600 are bettereducated, smoke less and do more exercise and then they areslightly lessprone to have chronic diseases and then they are slightly younger. So I'd like to weight each of the 600 men so that I can compensate foreducation, smoking, physical activity, chronic diseases (andmedication butthey are closely related so I think I'll just adjust for medicationas it isthe most precise measure) and age. So if I want to adjust for those, how do I go by that?I can see that the code below will adjust on age and geographysince thosedata are present through the two steps, but the more detailedinformation onsmoing, health and lifestyle is only present in step two.I don't know the tot_medgb (medicin) or tot_smokegp (smoking)amongst the5000 but only amongst the 3750.That is how do I incoorporate the two steps into the raking? Orshould I usethe post stratification command instead since I know these data on the individual level?As I see it running two rakings after each other: one for step 1and one forstep 2 would risk changing the what has been done in the first raking.I might be stupid but I don't really see how I can do this usingthe codebelow. Also,how many variables is it adviseable to rake on? Thank you for your help Kristian -----Oprindelig meddelelse----- Fra: owner-statalist@hsphsun2.harvard.edu[mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af StevenSamuelsSendt: Sunday, December 07, 2008 6:43 AM Til: statalist@hsphsun2.harvard.eduEmne: Re: SV: st: Survey - raking - calibration - poststratification -calculating weights -- Stas, I am envious of statisticians who draw samples from those lists. This is a double sample and I agree with your advice: give everyone the weight for their age stratum: weight1 = N_i/n_i where "N" denotes population and "n" denotes sample size. Kristian apparently thinks of the 5,000 person sample as his "population"; the figure that he linked to does not show the initial sampling step at all. He may not have access to the one-year census counts. If he does not, I suggest that he use the N's from the 5,000. I suggest below that he also form geographic categories and rake those, with population counts, if possible, otherwise with counts from the 5,000. I roughly calculate that with 5,000 in the first phase sample, bias in estimates and in standard errors will be small. Kristian, here is how to simultaneously match the age distribution and the geographic distribution of the final sample to your population. (This is called "sample balancing" or "raking".) Form age groups (agegp) and geographical groupings (geogp) and get the population counts(or percentages, see below) in each cell. **************************CODE BEGINS************************** * tot_agep = total for population in participant age group (agegp) * tot_geogp = total for population in participant geographical group (geogp) ************************************************************** survwgt rake weight1 /// by(agegp geogp) /// totvars(tot_agegp tot_geogp /// gen(weight2) ***************************CODE ENDS*************************** Raking can present problems, so so I suggest that you read http://www.abtassociates.com/ presentations/raking_survey_data_2_JOS.pdf.If youcannot get population counts, perhaps you can get population percentages, multiply by 10 or 100 and round to the nearest whole number (e.g. 5.12% = 51 or 512), so that the population "size" is 1,000 or 10,000. For estimating means and proportions, these will yield nearly the same results as actual population counts. The Denmark census counts or percentages might be available only in larger age categories than the ones you used to draw the sample: say (60-64, 65-70,70-74). If so, use those for the raking calculations. If you have, say, four geographical categories, you may be tempted to use 4 x 15 =60 stratification combinations. However, with only 600 people in the final sample, the numbers in individual cells will be too small for reliable estimation. Theory for double sampling can be found in WG Cochran, 1973, Sampling Techniques, pp 117-119, 327-334, or in most other texts. Unfortunately, raking will not completely solve the problem of non- response. -Steven On Dec 6, 2008, at 11:19 PM, Stas Kolenikov wrote:Steven, you might be shocked, but people in Nordic countries do have their population completely enumerated. Putting NJC's hat on :)), let meremind you that this is an international list, and differentcountrieshave different standards of how they collect and store their official data. Denmark has a register with an equivalent of SSN that makes it possible to combine the data three ways from economic, medical and social perspectives. That's a survey statistician and amicroeconometrician dream... and they actually do have thecapacity ofdrawing SRS. That is, the first 5000 were SRS of the population, and then Kristian continued a with stratified second phase sampling. I would probably just give everybody the weight = # in age group across Denmark (in some meaningfully defined period of the study) / # in age in group in the sample. If you treat sample groups as non-response adjustment cells, that's what this will probably boil down to after multiplication of three or so fractions. ches and help try:*

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**SV: SV: SV: st: Survey - raking - calibration - post stratification - calculating weights***From:*"Kristian Wraae" <Kristian_Wraae@vip.cybercity.dk>

**References**:**SV: SV: st: Survey - raking - calibration - post stratification - calculating weights***From:*"Kristian Wraae" <Kristian_Wraae@vip.cybercity.dk>

- Prev by Date:
**Re: st: R: R: bootstrapped p-values** - Next by Date:
**Re: st: R: R: bootstrapped p-values** - Previous by thread:
**SV: SV: st: Survey - raking - calibration - post stratification - calculating weights** - Next by thread:
**SV: SV: SV: st: Survey - raking - calibration - post stratification - calculating weights** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |