[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Stephen McKay <S.McKay@bristol.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: stata and weighting |

Date |
Thu, 11 Mar 2004 09:52:55 +0000 |

Many (perhaps most) social survey datasets come with non-integer weights, reflecting a mix of the sampling schema (e.g. one person per household randomly selected), and sometimes non-response, and sometimes calibration/grossing factors too. Increasingly, in the name of confidentiality, data depositors are reluctant to identify too much about the sampling points -- thus making PSU identification not always possible [and hence svy approaches in stata not really practicable]. At present, stata will let you use some types of weights, some of the time, on some types of command. The logic of which is hard to fathom. I appreciate that a simple-minded application of weights will give you incorrect confidence intervals. But at present stata makes it difficult to get the right point estimates in these circumstances. Here's a very simple example of what can happen, based on a simple indicator variable and a simple weight. . list +--------------+ | male wgt | |--------------| 1. | 0 1.5 | 2. | 0 1.2 | 3. | 1 .7 | 4. | 1 1.1 | 5. | 0 .7 | |--------------| 6. | 1 .8 | +--------------+ . su male [w=wgt] /// So summarize defaults to aweights. (analytic weights assumed) Variable | Obs Weight Mean Std. Dev. Min Max -------------+----------------------------------------------------------------- male | 6 6.00000006 .4333333 .5428321 0 1 . tab1 male [w=wgt] /// tab1 defaults to frequency weights, not allowed (frequency weights assumed) may not use noninteger frequency weights r(401); . tab1 male [iw=wgt] /// tab1 disallows iweights iweight not allowed r(101); . tab1 male [aw=wgt] /// tab1 disallows aweights aweight not allowed r(101); . table male [w=wgt] /// table defaults to freq weights, too (frequency weights assumed) may not use noninteger frequency weights r(401); . table male [aw=wgt] /// aweights gives you the "wrong" answers, through rouding off to integers ---------------------- male | Freq. ----------+----------- 0 | 3 1 | 3 ---------------------- . table male [iw=wgt] /// iweights give you the "right" answers ---------------------- male | Freq. ----------+----------- 0 | 3.4 1 | 2.6 ---------------------- . tab male [w=wgt] (frequency weights assumed) may not use noninteger frequency weights r(401); . tab male [aw=wgt] /// aweights with tab gives "right" answers male | Freq. Percent Cum. ------------+----------------------------------- 0 |3.400000002 56.67 56.67 1 |2.599999998 43.33 100.00 ------------+----------------------------------- Total | 6 100.00 . tab male [iw=wgt] /// iweights with tab gives "right" answers, but with different rounding! male | Freq. Percent Cum. ------------+----------------------------------- 0 | 3.40000004 56.67 56.67 1 | 2.60000002 43.33 100.00 ------------+----------------------------------- Total | 6.00000006 100.00 . log close Again, not sure the logic of some of these differences, for these perhaps the most simple of commands. I doubt there is much call for an nw option (naive weight)? But otherwise for some analysis one is reduced to multiplying and/or rounding off weights to get the point estimates that the data depositors/creators tell you that you should be getting (i.e. the ones in their report). Such as: gen wgt2=wgt*10 compress . tab1 male [w=wgt2] /// Right proportions, wrong 'bases' (frequency weights assumed) -> tabulation of male male | Freq. Percent Cum. ------------+----------------------------------- 0 | 34 56.67 56.67 1 | 26 43.33 100.00 ------------+----------------------------------- Total | 60 100.00 Surely there should be something better than this? Steve Date: Wed, 10 Mar 2004 23:29:46 -0500 From: Richard Williams <Richard.A.Williams.5@nd.edu> Subject: Re: st: non-integer frequencies? At 09:49 PM 3/10/2004 -0600, ACHINTYA RAY wrote: >Sample surveys oftentimes provide weights to convert sample estimates into >representative population figures. Sometimes such frequency weights are not >integers (For example, National Health and Nutrition Examination Survey >III). It seems that Stata can only deal with integer frequency weights. Is >there a solution? The best that I can do right now is to take the nearest >integer to the non-integer frequencies. This method seems rather adhoc. Any >help will be deeply appreciated. I think iweights will work, at least if the command allows the use of iweights. e.g. I just tried . sum income Variable | Obs Mean Std. Dev. Min Max - -------------+-------------------------------------------------------- income | 500 27.79 8.973491 5 48.3 . sum income [fw=1.2] may not use noninteger frequency weights r(401); . sum income [iw=1.2] Variable | Obs Weight Mean Std. Dev. Min Max - -------------+----------------------------------------------------------------- income | 500 600 27.79 8.971993 5 48.3 However, remember that, for purposes of statistical inference, the numbers you get are wrong. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: stata and weighting***From:*Nick Winter <nw53@cornell.edu>

- Prev by Date:
**st: RE: chitesti -- latest version -- findit** - Next by Date:
**st: stata and weighting, addendum** - Previous by thread:
**st: RE: chitesti -- latest version -- findit** - Next by thread:
**st: stata and weighting, addendum** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |