Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SV: SV: SV: SV: SV: st: Survey - raking - calibration - post stratification - calculating weights


From   Steven Samuels <sjhsamuels@earthlink.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: SV: SV: SV: SV: SV: st: Survey - raking - calibration - post stratification - calculating weights
Date   Wed, 10 Dec 2008 21:35:51 -0500

Kristian, the number of predictors in a logistic model should be no more than 10-15% of the number of events (here 600). It looks to me like you are over-fitting. See: http:// www.psychosomaticmedicine.org/cgi/content-nw/full/66/3/411/ . You are obligated to find a good (not perfect) model, as judged by goodness of fit, -linktest-, ROC curves, and consideration of interactions, for example. You will need a full, considered, analysis before you decide on the response-equation.

-Steve
On Dec 10, 2008, at 12:43 PM, Kristian Wraae wrote:

I have now tried to perform the complete calibration:

The do file is here:


egen tot_age_grp =  count(pop),by(age_grp)
egen tot_age_grp_q = count(pop) if quest==1, by(age_grp)

egen tot_geo_grp =  count(pop),by(geo_grp)
egen tot_geo_grp_q = count(pop) if quest==1, by(geo_grp)

gen weight1x = (tot_age_grp / tot_age_grp_q)

keep if quest==1
			*(reducing the dataset to 3743 men)
survwgt rake  weight1x,   ///
        by(age_grp  geo_grp) ///
        totvars(tot_age_grp tot_geo_grp) ///
        gen(weight2x)

svyset  [pweight=weight2x], strata(age_grp)




*Step 2

xi: logistic sample  i.age_grp i.child_cat i.marrital_cat i.job_kat
i.bicycle_cat i.jogging_cat i.organised i.smoke_grp i.education_grp
sp1_diag_db91 sp1_diag_dc61 sp1_diag_dd51 sp1_diag_de05 sp1_diag_de10 sp1_diag_de11 sp1_diag_de14 sp1_diag_de78 sp1_diag_dg40 sp1_diag_dg43 sp1_diag_dg44 sp1_diag_dg47 sp1_diag_dh40 sp1_diag_di10 sp1_diag_di20 sp1_diag_di25 sp1_diag_di49 sp1_diag_di51 sp1_diag_di63 sp1_diag_dj30 sp1_diag_dj33 sp1_diag_dj42 sp1_diag_dj44 sp1_diag_dj45 sp1_diag_dj96 sp1_diag_dk21 sp1_diag_dk22 sp1_diag_dk50 sp1_diag_dk51 sp1_diag_dl30 sp1_diag_dl40 sp1_diag_dm10 sp1_diag_dm13 sp1_diag_dm19 sp1_diag_dm53 sp1_diag_dm79 sp1_diag_dm81 sp1_diag_dn30 sp1_diag_dn40 sp1_diag_dt78
sp1_a02ba sp1_a02bc  sp1_a03aa  sp1_a07ec sp1_a10a sp1_a10ba sp1_a10bb
sp1_a11ba sp1_a11ea sp1_a12ba sp1_b01aa sp1_b01ac  sp1_b03b sp1_b03xa
sp1_c01aa sp1_c02ac sp1_c02ca sp1_c03aa  sp1_c03ba sp1_c03ca sp1_c03ea
sp1_c07aa sp1_c07ab sp1_c07ag sp1_c07bb sp1_c07cb sp1_c08ca sp1_c09aa
sp1_c09ba sp1_c09ca sp1_c10aa sp1_c01bd sp1_c01da sp1_c03da sp1_c08da
sp1_c08db sp1_c09da sp1_g04bd  sp1_g04ca sp1_g04cb sp1_h02ab sp1_h03aa
sp1_h03bb sp1_l01ba sp1_m01ab sp1_m01ae sp1_m01ah sp1_m04aa sp1_n02aa sp1_n02ax sp1_n02ba sp1_n02be sp1_n03af sp1_n03ax sp1_n05ab sp1_n05ba sp1_n05cd sp1_n06aa sp1_n06ab sp1_n06ax sp1_p01bc sp1_r03ac sp1_r03ak
sp1_r03ba sp1_r03bb  sp1_r03dc  sp1_r06ae sp1_r06ax

predict p_r
lroc

gen weight3x = weight2x * (1/p_r)
keep if sample == 1
				*(reducing dataset to 600 men)
survwgt rake  weight3x,   ///
        by(age_grp geo_grp) ///
        totvars(tot_age_grp tot_geo_grp ) ///
        gen(weight4x)

svyset  [pweight=weight4x], strata(age_grp)


As you can see the logistic regression is quite large.

All the variables with _cat are categorical the rest are binary variables.

The variables with diag is diagnoses on ICD10 codes the rest are medication
on ATC codes.

I have performed the lroc command and you can see the ROC curve here:
http://www.euphonium.dk/ROC1.pdf

After running the calibration I have done some svyprop commands on some of
the variables used in the logistic regression.

The tables are to be read two and two. First for the 3743 calibrated to 4975
and then the 600 calibrated to the 4975.

---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +----------------------------------------+
  | age_grp   Obs   Est. Prop.   Std. Err. |
  |----------------------------------------|
  |       1   346     0.090452    0.000136 |
  |       2   333     0.088040    0.000140 |
  |       3   304     0.079397    0.000128 |
  |       4   297     0.075377    0.000124 |
  |       5   284     0.075578    0.000127 |
  |----------------------------------------|
  |       6   275     0.074372    0.000126 |
  |       7   249     0.069146    0.000124 |
  |       8   246     0.063317    0.000114 |
  |       9   231     0.061508    0.000116 |
  |      10   209     0.060101    0.000116 |
  |----------------------------------------|
  |      11   212     0.055276    0.000105 |
  |      12   210     0.054472    0.000110 |
  |      13   184     0.052864    0.000116 |
  |      14   174     0.048442    0.000104 |
  |      15   189     0.051658    0.000113 |
  +----------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +----------------------------------------+
  | age_grp   Obs   Est. Prop.   Std. Err. |
  |----------------------------------------|
  |       1    38     0.090452    0.008267 |
  |       2    47     0.088040    0.006523 |
  |       3    41     0.079397    0.009195 |
  |       4    41     0.075377    0.006819 |
  |       5    44     0.075578    0.008794 |
  |----------------------------------------|
  |       6    38     0.074372    0.007082 |
  |       7    44     0.069146    0.007934 |
  |       8    48     0.063317    0.004680 |
  |       9    43     0.061508    0.004916 |
  |      10    41     0.060101    0.004005 |
  |----------------------------------------|
  |      11    42     0.055276    0.003447 |
  |      12    35     0.054472    0.004787 |
  |      13    39     0.052864    0.006314 |
  |      14    33     0.048442    0.005914 |
  |      15    26     0.051658    0.006062 |
  +----------------------------------------+



---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +----------------------------------------+
  | geo_grp   Obs   Est. Prop.   Std. Err. |
  |----------------------------------------|
  |       1   207     0.056884    0.003840 |
  |       2   528     0.146533    0.005868 |
  |       3   681     0.176080    0.006141 |
  |       4    82     0.022111    0.002415 |
  |       5   490     0.133467    0.005609 |
  |----------------------------------------|
  |       6   485     0.128844    0.005465 |
  |       7   460     0.122010    0.005345 |
  |       8   429     0.108744    0.004978 |
  |       9   381     0.105327    0.005098 |
  +----------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +----------------------------------------+
  | geo_grp   Obs   Est. Prop.   Std. Err. |
  |----------------------------------------|
  |       1    31     0.056884    0.011311 |
  |       2   106     0.146533    0.015523 |
  |       3   123     0.176080    0.017349 |
  |       4    10     0.022111    0.007494 |
  |       5    68     0.133467    0.016347 |
  |----------------------------------------|
  |       6    80     0.128844    0.015683 |
  |       7    76     0.122010    0.015553 |
  |       8    83     0.108744    0.013280 |
  |       9    23     0.105327    0.021331 |
  +----------------------------------------+

---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +-------------------------------------------+
  | child_cat    Obs   Est. Prop.   Std. Err. |
  |-------------------------------------------|
  |         1    317     0.084959    0.004569 |
  |         2    525     0.140334    0.005677 |
  |         3   1575     0.419478    0.008049 |
  |         4    892     0.238786    0.006974 |
  |         5    434     0.116444    0.005238 |
  +-------------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +------------------------------------------+
  | child_cat   Obs   Est. Prop.   Std. Err. |
  |------------------------------------------|
  |         1    28     0.081317    0.016047 |
  |         2    95     0.141463    0.015804 |
  |         3   282     0.436360    0.024237 |
  |         4   132     0.214139    0.018806 |
  |         5    63     0.126722    0.017682 |
  +------------------------------------------+

---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +----------------------------------------------+
  | marrital_cat    Obs   Est. Prop.   Std. Err. |
  |----------------------------------------------|
  |            1   2983     0.796177    0.006586 |
  |            2    186     0.049980    0.003575 |
  |            3    289     0.077124    0.004348 |
  |            4    285     0.076718    0.004331 |
  +----------------------------------------------+

---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15

observations>                        Number of PSUs     =       600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +---------------------------------------------+
  | marrital_cat   Obs   Est. Prop.   Std. Err. |
  |---------------------------------------------|
  |            1   497     0.806723    0.020646 |
  |            2    18     0.049089    0.013136 |
  |            3    35     0.068634    0.013130 |
  |            4    50     0.075554    0.012606 |
  +---------------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +-----------------------------------------+
  | job_kat    Obs   Est. Prop.   Std. Err. |
  |-----------------------------------------|
  |       1    513     0.134823    0.005026 |
  |       2    351     0.093210    0.004715 |
  |       3   1024     0.271445    0.005905 |
  |       4    283     0.075224    0.004200 |
  |       5   1572     0.425299    0.003928 |
  +-----------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +----------------------------------------+
  | job_kat   Obs   Est. Prop.   Std. Err. |
  |----------------------------------------|
  |       1    75     0.138626    0.016727 |
  |       2    53     0.111242    0.017358 |
  |       3   150     0.257521    0.018800 |
  |       4    27     0.067597    0.014395 |
  |       5   295     0.425014    0.014693 |
  +----------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +---------------------------------------------+
  | bicycle_cat    Obs   Est. Prop.   Std. Err. |
  |---------------------------------------------|
  |           1   2040     0.545593    0.008148 |
  |           2    257     0.068640    0.004134 |
  |           3    319     0.084807    0.004544 |
  |           4    341     0.090828    0.004696 |
  |           5    218     0.058108    0.003827 |
  |---------------------------------------------|
  |           6    235     0.062901    0.003969 |
  |           7    333     0.089124    0.004663 |
  +---------------------------------------------+



---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +--------------------------------------------+
  | bicycle_cat   Obs   Est. Prop.   Std. Err. |
  |--------------------------------------------|
  |           1   300     0.565556    0.023693 |
  |           2    48     0.066815    0.011137 |
  |           3    58     0.085304    0.013220 |
  |           4    70     0.088783    0.011871 |
  |           5    33     0.047710    0.008854 |
  |--------------------------------------------|
  |           6    38     0.060439    0.010592 |
  |           7    53     0.085394    0.012454 |
  +--------------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +---------------------------------------------+
  | jogging_cat    Obs   Est. Prop.   Std. Err. |
  |---------------------------------------------|
  |           1   3576     0.955597    0.003357 |
  |           2    167     0.044403    0.003357 |
  +---------------------------------------------+




---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +--------------------------------------------+
  | jogging_cat   Obs   Est. Prop.   Std. Err. |
  |--------------------------------------------|
  |           1   567     0.950082    0.009837 |
  |           2    33     0.049918    0.009837 |
  +--------------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +-------------------------------------------+
  | organised    Obs   Est. Prop.   Std. Err. |
  |-------------------------------------------|
  |         1   3163     0.845951    0.005894 |
  |         2    409     0.108535    0.005071 |
  |         3    171     0.045514    0.003403 |
  +-------------------------------------------+



---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +------------------------------------------+
  | organised   Obs   Est. Prop.   Std. Err. |
  |------------------------------------------|
  |         1   459     0.851567    0.013912 |
  |         2    93     0.111387    0.012676 |
  |         3    48     0.037046    0.005691 |
  +------------------------------------------+



---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +-------------------------------------------+
  | smoke_grp    Obs   Est. Prop.   Std. Err. |
  |-------------------------------------------|
  |         1    882     0.235073    0.006911 |
  |         2   1384     0.369679    0.007899 |
  |         3   1477     0.395247    0.007996 |
  +-------------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +------------------------------------------+
  | smoke_grp   Obs   Est. Prop.   Std. Err. |
  |------------------------------------------|
  |         1   142     0.249654    0.021930 |
  |         2   224     0.351138    0.022435 |
  |         3   234     0.399208    0.024120 |
  +------------------------------------------+


---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +----------------------------------------------+
  | education_~p    Obs   Est. Prop.   Std. Err. |
  |----------------------------------------------|
  |            1   1783     0.476759    0.008146 |
  |            2    936     0.250221    0.007089 |
  |            3    252     0.067364    0.004100 |
  |            4    487     0.129609    0.005478 |
  |            5    285     0.076047    0.004329 |
  +----------------------------------------------+



---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +---------------------------------------------+
  | education_~p   Obs   Est. Prop.   Std. Err. |
  |---------------------------------------------|
  |            1   246     0.484926    0.024701 |
  |            2   160     0.240029    0.019931 |
  |            3    48     0.075965    0.014029 |
  |            4    85     0.124745    0.015151 |
  |            5    61     0.074336    0.010873 |
  +---------------------------------------------+

Some of the chronic diseases:

Hypertension:
---------------------------------------------------------------------- ------
--
pweight:  weight2x                              Number of obs      =
3743
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
3743
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +----------------------------------------------+
  | sp1_diag~i10    Obs   Est. Prop.   Std. Err. |
  |----------------------------------------------|
  |            0   3404     0.909003    0.004718 |
  |            1    339     0.090997    0.004718 |
  +----------------------------------------------+

---------------------------------------------------------------------- ------
--
pweight:  weight4x                              Number of obs      =
600
Strata:   age_grp                               Number of strata   =
15
PSU:      <observations>                        Number of PSUs     =
600
                                                Population size    =
4975
---------------------------------------------------------------------- ------
--

Survey proportions estimation

  +---------------------------------------------+
  | sp1_diag~i10   Obs   Est. Prop.   Std. Err. |
  |---------------------------------------------|
  |            0   548     0.905739    0.015309 |
  |            1    52     0.094261    0.015309 |
  +---------------------------------------------+

Any thoughts ?

- Kristian




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index