Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to set calibrated weights


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to set calibrated weights
Date   Sat, 20 Oct 2012 17:41:40 -0400

> 
> On Oct 20, 2012, at 5:08 AM, Veronica Galassi wrote:
> 
> Dear Steve,
> 
> Thank you very much for your kind reply and the useful references!
> Your answer actually clarified many other doubts I had.
> 
> Your intuition that my post-stratified weights are calibrated is
> correct. Unfortunately, I checked again the documents explaining the
> sampling methodology and there the PSU is simply defined as a
> geographic area containing more than 74 dwellings. Therefore I expect
> the number of PSU to be high (around 3,000) whereas I only have 9
> provinces and 4 geographical types in my survey. This implies that
> none of my cluster variables can be the PSU.

You still haven't persuaded me. I'd have to see the quote from the study
documents. Or, better, post a link to them if they are online. You'd
better figure out what role, if any, the cluster variables have in the
design. Why did you name them "cluster 1" and "cluster 2"?
> However, if I got your point, it does not really matter which PSU I
> indicate when conducting descriptive statistics. Is it correct?

No, it is not. It is scientifically irresponsible to publish estimates
of descriptive statistics without indications of uncertainty (SEs, CIs).

> For
> this reason, I also tried not to indicate any PSU but Stata gave me
> back the error: "invalid use of _n; observations can only be sampled
> in the final stage".
See FAQ Section 3.3 First stence

> To cut it short, do you still believe I can use the statement "svyset
> w2_gc_prov [pw = w2_wgt], strata(w2_gc_dc) || w2_hhgeo" you previously
> indicated to set my calibrated weigths? ( In my case I cannot use the
> fpc option).

I don't know, because you have not yet correctly described the sampling
design. As an aside, ave you even tried the statement, which assumed
that w2_gc_prov is the OSY? When you do, follow it by -svydes-.

> 
2012/10/20 Steve Samuels <sjsamuels@gmail.com>:
> Veronica,
> 
> The PSU variable is not missing. It is the sampling unit at the first
> stage of sampling and it's one of your cluster variables, probably
> "cluster 1" (check). Your statement that one must know the PSU variable
> to use probability weights is also incorrect. One can get proper
> weighted estimates, though not standard errors, without knowing the PSU.
> 
> I'm not sure what wrong with your -concat- statement. I would have
> used "egen combination = group()". For it to have worked, the value of
> the "post-stratification weight" would have to be the population count
> for each combination of the three variables.
> 
> If the "post-stratification" weights are not integers, they are probably
> "calibration" weights that have already adjusted the probability
> weights. In that case, further post-stratification are likely to be
> superfluous. You would  then use the "post-stratification weight" in place of
> the probability weights. All weights should be
> described in the study documents (though usually not the"codebook"). If
> they are not, then contact the organization that did the study for
> details.
> 
> If sampling was without replacement at one or more stages,
> you could use the fpc() option for those stages. In practice,
> it makes a difference only for the first stage.
> 
> In any case, one guess at a -svyset- statement (assuming the
> "post-stratification weight" is a "calibration" weight) is:
> *************************************************************
> svyset w2_gc_prov [pw = w2_wgt], strata(w2_gc_dc) || w2_hhgeo
> **************************************************************
> 
> But I could be wrong, depending on how w2_wgt was calculated.
> 
> Before proceeding, I suggest that you learn more about sampling or take
> a survey course. I gave some references in:
> http://www.stata.com/statalist/archive/2012-09/msg01058.html.
> The Stata survey manual is also a very good resource, though the section on
> post-stratification is skimpy.
> 
> Steve
> 
> 
> On Oct 19, 2012, at 1:57 PM, Veronica Galassi wrote:
> 
> Dear Statalisters,
> 
> I am writing you concerning the application of calibrated weights to
> my dataset for the computation of descriptive statistics only.
> 
> The dataset I am working on collects information at household and
> individual level and comes from a stratified, two-stage clustered
> sample. The followings are the variables I have got:
> - probability weights: w2_dwgt
> - strata: w2_gc_dc
> - cluster 1: w2_gc_prov
> - cluster 2: w2_hhgeo
> - post-stratified weights: w2_wgt
> - age intervals:  w2_age_intervals
> - gender: w2_best_gen
> - population group: w2_best_race
> 
> In order to set the probability weights using the command svyset, I
> need the psu variable. As you may have noticed, this variable is
> missing and this makes me impossible to set pweights.
> In addition, from a couple of previous statalist conversations ( see
> in particular: http://www.ats.ucla.edu/stat/stata/faq/svy_stata_post.htm
> and http://www.stata.com/statalist/archive/2012-02/msg00584.html), I
> understood that:
> - when using calibrated weights I still have to set pweights and
> specify the original strata and clusters
> - In order to apply calibrated data I need to know the characteristics
> on the base of which the sample have been post-stratified ( in my case
> age intervals, gender and population groups).
> 
> Therefore, I tried to set my post-stratified weights using the
> following command:
> "svyset [pw=w2_dwgt], strata (w2_gc_dc) poststrata (w2_age_intervals
> w2_best_gen w2_best_race) postweight(w2_wgt)"
> which did not work because in Stata the poststrata must be mutually
> exclusive and thus only one variable can be specified.
> 
> In order to overcome this problem, I tried to generate a variable
> which is a combination of the three characteristics by using the
> command
> "egen combination=concat( w2_age_intervals w2_best_race w2_best_gen),
> format (float)".
> However, this command generated a variable containing only missing
> values and for this reason Stata gave me back the error:
> "option postweight() requires option poststrata()".
> The only way to make Stata set the post-calibrated weight was by using
> the command
> "svyset, poststrata (combination) postweight(w2_wgt)" with combination
> being a string variable. However I am scared that this command is not
> complete.
> 
> At this point, I would really appreciate any hint on what I am doing
> wrong and how to proceed to set my post-stratified weights.
> 
> Many thanks for your help!
> 
> Kind regards,
> 
> Veronica Galassi
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index