Re: st: standard deviations for survey data

 From jpitblado@stata.com (Jeff Pitblado, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: standard deviations for survey data Date Thu, 06 Apr 2006 14:15:49 -0500

```Deborah Holtzman <DHoltzman@air.org> asks about a comment made in an FAQ that
mentions, among other things, how to estimate the population standard
deviation of a measurement sampled using a complex survey design:

> I am analyzing some survey data with relatively complex sampling
> (2-stage, stratified cluster at the first stage; FPC at both stages). I
> have svyset the data and am using svy: commands.
>
>  I would like to get standard deviations as well as means for my
> variables, as I would like to get a sense of spread as well as central
> tendency. I have read the Stata FAQ "Why doesn't summarize accept
> pweights? What does summarize calculate when you use aweights?" at:
>
> http://www.stata.com/support/faqs/stat/supweight.html
>
>  ...which addresses the issue (albeit from a different angle) and does
> provide a way to get Stata to calculate a standard deviation following a
> svy: mean command (di sqrt(e(N) * el(e(V_srs),1,1)), but contains the
> following statement:

> "We probably don't care about an estimate of the standard deviation of the
> population."

> I am troubled by this statement.  Why would we not care about an estimate of
> the standard deviation of the population?

> Do I need to be cautious about how I interpret the standard deviation
> produced by the di...  command?

Here is the paragraph where the comment was made in the FAQ:

When we say that we want "the mean and standard deviation of a
variable with probability weights", what we likely want is an estimate
of the population mean and the standard error of this estimator for
the population mean. We probably don't care about an estimate of the
standard deviation of the population.

The point here is that the standard deviation of X is not the same thing as
the standard error of its estimated population mean.  In most cases, we
probably want to make inferences about the mean, thus we need the standard
error of the estimated mean instead of the standard deviation X.

Deborah mentions that she wants 'a sense of spread', so in her case she truly
does want to see estimates of the standard deviation of her variables of
interest.  The above FAQ shows how to get them.

--Jeff