Re: st: How to set calibrated weights

From   Veronica Galassi <>
Date   Sat, 20 Oct 2012 10:08:10 +0100

Dear Steve,

Thank you very much for your kind reply and the useful references!
Your answer actually clarified many other doubts I had.

Your intuition that my post-stratified weights are calibrated is
correct. Unfortunately, I checked again the documents explaining the
sampling methodology and there the PSU is simply defined as a
geographic area containing more than 74 dwellings. Therefore I expect
the number of PSU to be high (around 3,000) whereas I only have 9
provinces and 4 geographical types in my survey. This implies that
none of my cluster variables can be the PSU.
However, if I got your point, it does not really matter which PSU I
indicate when conducting descriptive statistics. Is it correct? For
this reason, I also tried not to indicate any PSU but Stata gave me
back the error: "invalid use of _n; observations can only be sampled
in the final stage".

To cut it short, do you still believe I can use the statement "svyset
w2_gc_prov [pw = w2_wgt], strata(w2_gc_dc) || w2_hhgeo" you previously
indicated to set my calibrated weigths? ( In my case I cannot use the
fpc option).

Thank you very much for your help, I really appreciate it!

Kind regards,


2012/10/20 Steve Samuels <>:
> Veronica,
> The PSU variable is not missing. It is the sampling unit at the first
> stage of sampling and it's one of your cluster variables, probably
> "cluster 1" (check). Your statement that one must know the PSU variable
> to use probability weights is also incorrect. One can get proper
> weighted estimates, though not standard errors, without knowing the PSU.
> I'm not sure what wrong with your -concat- statement. I would have
> used "egen combination = group()". For it to have worked, the value of
> the "post-stratification weight" would have to be the population count
> for each combination of the three variables.
> If the "post-stratification" weights are not integers, they are probably
> "calibration" weights that have already adjusted the probability
> weights. In that case, further post-stratification are likely to be
> superfluous. You would  then use the "post-stratification weight" in place of
> the probability weights. All weights should be
> described in the study documents (though usually not the"codebook"). If
> they are not, then contact the organization that did the study for
> details.
> If sampling was without replacement at one or more stages,
> you could use the fpc() option for those stages. In practice,
> it makes a difference only for the first stage.
> In any case, one guess at a -svyset- statement (assuming the
> "post-stratification weight" is a "calibration" weight) is:
> *************************************************************
> svyset w2_gc_prov [pw = w2_wgt], strata(w2_gc_dc) || w2_hhgeo
> **************************************************************
> But I could be wrong, depending on how w2_wgt was calculated.
> Before proceeding, I suggest that you learn more about sampling or take
> a survey course. I gave some references in:
> The Stata survey manual is also a very good resource, though the section on
> post-stratification is skimpy.
> Steve
> On Oct 19, 2012, at 1:57 PM, Veronica Galassi wrote:
> Dear Statalisters,
> I am writing you concerning the application of calibrated weights to
> my dataset for the computation of descriptive statistics only.
> The dataset I am working on collects information at household and
> individual level and comes from a stratified, two-stage clustered
> sample. The followings are the variables I have got:
> - probability weights: w2_dwgt
> - strata: w2_gc_dc
> - cluster 1: w2_gc_prov
> - cluster 2: w2_hhgeo
> - post-stratified weights: w2_wgt
> - age intervals:  w2_age_intervals
> - gender: w2_best_gen
> - population group: w2_best_race
> In order to set the probability weights using the command svyset, I
> need the psu variable. As you may have noticed, this variable is
> missing and this makes me impossible to set pweights.
> In addition, from a couple of previous statalist conversations ( see
> in particular:
> and, I
> understood that:
> - when using calibrated weights I still have to set pweights and
> specify the original strata and clusters
> - In order to apply calibrated data I need to know the characteristics
> on the base of which the sample have been post-stratified ( in my case
> age intervals, gender and population groups).
> Therefore, I tried to set my post-stratified weights using the
> following command:
> "svyset [pw=w2_dwgt], strata (w2_gc_dc) poststrata (w2_age_intervals
> w2_best_gen w2_best_race) postweight(w2_wgt)"
> which did not work because in Stata the poststrata must be mutually
> exclusive and thus only one variable can be specified.
> In order to overcome this problem, I tried to generate a variable
> which is a combination of the three characteristics by using the
> command
> "egen combination=concat( w2_age_intervals w2_best_race w2_best_gen),
> format (float)".
> However, this command generated a variable containing only missing
> values and for this reason Stata gave me back the error:
> "option postweight() requires option poststrata()".
> The only way to make Stata set the post-calibrated weight was by using
> the command
> "svyset, poststrata (combination) postweight(w2_wgt)" with combination
> being a string variable. However I am scared that this command is not
> complete.
> At this point, I would really appreciate any hint on what I am doing
> wrong and how to proceed to set my post-stratified weights.
> Many thanks for your help!
> Kind regards,
> Veronica Galassi
*   For searches and help try:

