Veronica Galassi

statalist@hsphsun2.harvard.edu

Re: st: How to set calibrated weights

Sat, 20 Oct 2012 10:08:10 +0100

Dear Steve, Thank you very much for your kind reply and the useful references! Your answer actually clarified many other doubts I had. Your intuition that my post-stratified weights are calibrated is correct. Unfortunately, I checked again the documents explaining the sampling methodology and there the PSU is simply defined as a geographic area containing more than 74 dwellings. Therefore I expect the number of PSU to be high (around 3,000) whereas I only have 9 provinces and 4 geographical types in my survey. This implies that none of my cluster variables can be the PSU. However, if I got your point, it does not really matter which PSU I indicate when conducting descriptive statistics. Is it correct? For this reason, I also tried not to indicate any PSU but Stata gave me back the error: "invalid use of _n; observations can only be sampled in the final stage". To cut it short, do you still believe I can use the statement "svyset w2_gc_prov [pw = w2_wgt], strata(w2_gc_dc) || w2_hhgeo" you previously indicated to set my calibrated weigths? ( In my case I cannot use the fpc option). Thank you very much for your help, I really appreciate it! Kind regards, Veronica 2012/10/20 Steve Samuels <sjsamuels@gmail.com>: > Veronica, > > The PSU variable is not missing. It is the sampling unit at the first > stage of sampling and it's one of your cluster variables, probably > "cluster 1" (check). Your statement that one must know the PSU variable > to use probability weights is also incorrect. One can get proper > weighted estimates, though not standard errors, without knowing the PSU. > > I'm not sure what wrong with your -concat- statement. I would have > used "egen combination = group()". For it to have worked, the value of > the "post-stratification weight" would have to be the population count > for each combination of the three variables. > > If the "post-stratification" weights are not integers, they are probably > "calibration" weights that have already adjusted the probability > weights. In that case, further post-stratification are likely to be > superfluous. You would then use the "post-stratification weight" in place of > the probability weights. All weights should be > described in the study documents (though usually not the"codebook"). If > they are not, then contact the organization that did the study for > details. > > If sampling was without replacement at one or more stages, > you could use the fpc() option for those stages. In practice, > it makes a difference only for the first stage. > > In any case, one guess at a -svyset- statement (assuming the > "post-stratification weight" is a "calibration" weight) is: > ************************************************************* > svyset w2_gc_prov [pw = w2_wgt], strata(w2_gc_dc) || w2_hhgeo > ************************************************************** > > But I could be wrong, depending on how w2_wgt was calculated. > > Before proceeding, I suggest that you learn more about sampling or take > a survey course. I gave some references in: > http://www.stata.com/statalist/archive/2012-09/msg01058.html. > The Stata survey manual is also a very good resource, though the section on > post-stratification is skimpy. > > Steve > > > On Oct 19, 2012, at 1:57 PM, Veronica Galassi wrote: > > Dear Statalisters, > > I am writing you concerning the application of calibrated weights to > my dataset for the computation of descriptive statistics only. > > The dataset I am working on collects information at household and > individual level and comes from a stratified, two-stage clustered > sample. The followings are the variables I have got: > - probability weights: w2_dwgt > - strata: w2_gc_dc > - cluster 1: w2_gc_prov > - cluster 2: w2_hhgeo > - post-stratified weights: w2_wgt > - age intervals: w2_age_intervals > - gender: w2_best_gen > - population group: w2_best_race > > In order to set the probability weights using the command svyset, I > need the psu variable. As you may have noticed, this variable is > missing and this makes me impossible to set pweights. > In addition, from a couple of previous statalist conversations ( see > in particular: http://www.ats.ucla.edu/stat/stata/faq/svy_stata_post.htm > and http://www.stata.com/statalist/archive/2012-02/msg00584.html), I > understood that: > - when using calibrated weights I still have to set pweights and > specify the original strata and clusters > - In order to apply calibrated data I need to know the characteristics > on the base of which the sample have been post-stratified ( in my case > age intervals, gender and population groups). > > Therefore, I tried to set my post-stratified weights using the > following command: > "svyset [pw=w2_dwgt], strata (w2_gc_dc) poststrata (w2_age_intervals > w2_best_gen w2_best_race) postweight(w2_wgt)" > which did not work because in Stata the poststrata must be mutually > exclusive and thus only one variable can be specified. > > In order to overcome this problem, I tried to generate a variable > which is a combination of the three characteristics by using the > command > "egen combination=concat( w2_age_intervals w2_best_race w2_best_gen), > format (float)". > However, this command generated a variable containing only missing > values and for this reason Stata gave me back the error: > "option postweight() requires option poststrata()". > The only way to make Stata set the post-calibrated weight was by using > the command > "svyset, poststrata (combination) postweight(w2_wgt)" with combination > being a string variable. However I am scared that this command is not > complete. > > At this point, I would really appreciate any hint on what I am doing > wrong and how to proceed to set my post-stratified weights. > > Many thanks for your help! > > Kind regards, > > Veronica Galassi > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

