-Steve On Dec 9, 2008, at 10:23 AM, Kristian Wraae wrote:

I think the reason why STATA complains about totals not being equalis thatI have one geography category missing amingst the 600. We refrainedfromasking people who lived on distant islands, and thus had difficultyshowingup, to participate in the final sample to avoid have too manydropouts.So I suppose we should drop all individuals living on islandsamongst the4975 (it is only 164) and later amongst the 3743 (120) in order todo thefinal raking with geography.

Alternatively the final raking should be donewithout geography since there is really no reason to belive thatgeographyshould be a factor determining health.

Another approach is to include the islands into the most distantzip-codecategory, but that will interfere with the assumption that all hadthe sameprobability of being included in the final sample.

My best suggesting will be not to rake on geography at in the lasttwo steps(or maybe at all). Age is definately the most important variable to rake on.

-----Oprindelig meddelelse----- Fra: owner-statalist@hsphsun2.harvard.edu[mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af KristianWraaeSendt: Tuesday, December 09, 2008 1:23 PM Til: statalist@hsphsun2.harvard.eduEmne: SV: SV: st: Survey - raking - calibration - poststratification -calculating weights Now I have continued to step 2 with this do file: *Step 2 xi: logistic sample i.age_grp i.geo_grp i.health_medication i.health_diseases predict p_r gen weight3x = weight2x * (1/p_r) keep if sample == 1 *(reducing dataset to 600 men) survwgt rake weight3x, /// by(age_grp geo_grp) /// totvars(tot_age_grp tot_geo_grp) /// gen(weight4x)The problem now is that Stata says that "totals across dimensions 1and 2are not equal"Why is that? Should I generate new totals for tot_age_grp andtot_geo_grp?Should they be based on the 3743 Why?How do I deal with missing values in p_r (depending on whichpredictors Iinclude in the logistisk regression I might get missing values forp_r).-----Oprindelig meddelelse----- Fra: owner-statalist@hsphsun2.harvard.edu[mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af KristianWraaeSendt: Tuesday, December 09, 2008 12:35 PM Til: statalist@hsphsun2.harvard.eduEmne: SV: SV: st: Survey - raking - calibration - poststratification -calculating weights I have now tried to do the first step of the raking. I have 15 age groups and 67 geographic groups (simply based on the zip codes).I tried to do the raking first with a smaller number of geographicgroups(10) but the results were more accurate with all groups. The variable I have are:age = continuos variable containg the age of the subject at thetime ofsampling dist_study = continuous variable containing the distancefrom theindividual to me. age_grp = categorial variable - 15 age strata.geo_grp =zip code quest = 1 if individual returned a filled outquestionnaire pop = 1if individual was amongst the 4975 in the original sample (all hadof coursepop=1) sample = 1 for each finally included subject. The do file looks like this: ************* *To get data from the orginal population tabstat age tabstat dist_study*Raking starts by generating totals in each age group andgeographical groupegen tot_age_grp = count(pop),by(age_grp) egen tot_age_grp_q =count(pop)if quest==1, by(age_grp) egen tot_geo_grp = count(pop),by(geo_grp)egen tot_geo_grp_q = count(pop) if quest==1, by(geo_grp) *Initalweight isgenerated gen weight1x = (tot_age_grp / tot_age_grp_q) keep if quest==1 *(reducing the dataset to 3743 men) survwgt rake weight1x, /// by(age_grp geo_grp) /// totvars(tot_age_grp tot_geo_grp) /// gen(weight2x) svyset [pweight=weight2x], strata(age_grp) *Description svydes*Now we estimate the average age in the 4975 men from the 3743 mensvymeanage *Now we estimate the average distance to travel to get to mefor the4975 men based on the 3743 men svymean dist_study *These are the actual numbers for the 3743 men. tabstat age tabstat dist_study ****************** The output from Stat8 is: . ************* . tabstat age variable | mean -------------+---------- age | 66.6695 ------------------------ . tabstat dist_study variable | mean -------------+---------- dist_study | 25.90153 ------------------------ . . . egen tot_age_grp = count(pop),by(age_grp). egen tot_age_grp_q = count(pop) if quest==1, by(age_grp) (1232missingvalues generated) . . egen tot_geo_grp = count(pop),by(geo_grp). egen tot_geo_grp_q = count(pop) if quest==1, by(geo_grp) (1232missingvalues generated) . . gen weight1x = (tot_age_grp / tot_age_grp_q) (1232 missing values generated) . . keep if quest==1 (1232 observations deleted) . *(reducing the dataset to 3743 men) . survwgt rake weight1x, ///by(age_grp geo_grp) /// totvars(tot_age_grp tot_geo_grp) /// gen(weight2x). . svyset [pweight=weight2x], strata(age_grp) pweight is weight2x strata is age_grp . . svydes pweight: weight2x Strata: age_grp PSU: <observations> #Obs per PSU Strata ---------------------------- age_grp #PSUs #Obs min mean max -------- -------- -------- -------- -------- -------- 1 346 346 1 1.0 1 2 333 333 1 1.0 1 3 304 304 1 1.0 1 4 297 297 1 1.0 1 5 284 284 1 1.0 1 6 275 275 1 1.0 1 7 249 249 1 1.0 1 8 246 246 1 1.0 1 9 231 231 1 1.0 1 10 209 209 1 1.0 1 11 212 212 1 1.0 1 12 210 210 1 1.0 1 13 184 184 1 1.0 1 14 174 174 1 1.0 1 15 189 189 1 1.0 1 -------- -------- -------- -------- -------- -------- 15 3743 3743 1 1.0 1 . . svymean age Survey mean estimation pweight: weight2x Number of obs = 3743 Strata: age_grp Number of strata = 15 PSU: <observations> Number of PSUs = 3743Population size= 4975------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff---------+-----------------------------------------------------------------------+---- -- age | 66.66605 .0067455 66.65283 66.67928 .0092211------------------------------------------------------------------------------ . svymean dist_study Survey mean estimation pweight: weight2x Number of obs = 3742 Strata: age_grp Number of strata = 15 PSU: <observations> Number of PSUs = 3742 Population size = 4973.7235------------------------------------------------------------------------------ Mean | Estimate Std. Err. [95% Conf. Interval] Deff---------+-----------------------------------------------------------------------+---- -- dist_s~y | 25.90772 .3139459 25.2922 26.52325 1.01731------------------------------------------------------------------------------ . . tabstat age variable | mean -------------+---------- age | 66.5895 ------------------------ . tabstat dist_study variable | mean -------------+---------- dist_study | 25.93867 ------------------------ . end of do-file As one can see the average age amongst the 4975 men is: 66.6695Using raking and svymean Stata estimates the average age amongstthe 4975men based on the information from the 3743 men to be: 66.66605 As one can see those are quite similar.Now let us look at the distance to travel. 