# Re: st: subpop and the mysterious sample size

 From jpitblado@stata.com (Jeff Pitblado, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: st: subpop and the mysterious sample size Date Fri, 19 May 2006 11:01:53 -0500

```ROBERT BOZICK <rbozick1@jhem.jhu.edu> is interested in computing the
subpopulation standard deviations for one or more subpops:

> The reason for needing the correct sample size is that I need to compute the
> population standard deviation.  I have been using the command suggested in
> http://www.stata.com/support/faqs/stat/supweight.html
>
> The post estimation command after I use the svymean is:
> di sqrt(e(N) * el(e(V_srs),1,1))
>
> Additionally, I will be using some by commands as well.    For example:
>
> svymean var1, subpop(samp) by(sex)
>
> and then I want to compute the population standard deviation for var1 for
> both categories of sex.  Without the sample size, I cannot get the correct
> standard deviation using the suggested post estimation command.

Stata 8:

In the FAQ, subpopulations are not really mentioned, but we can use the
discussion about estimating the population standard deviation to derive how we
would estimate the subpopulation standard deviation.

When you specify the -subpop()- and/or -by()- options to -svymean-, the
subpopulation sample sizes are stored in e(_N).  Thus the formula becomes

sqrt(el(e(_N),1,1) * el(e(V_srs),1,1))

Although not mentioned in the FAQ, this formula only applies if you have not
-svyset- using the -fpc()- option.  If you -svyset- using the -fpc()- option,
then the formula is

sqrt(el(e(_N),1,1) * el(e(V_srswr),1,1))

In the case where you would use the -srssubpop- option for looking at the DEFF
and DEFT design effects, the above formulas apply; just note that you will get
a different e(V_srs) and e(V_srswr) when the -srssubpop- option of -svymean-
is specified than when it isn't.

The above formulas only apply to the first subpop of the first variable.  To
get a row vector for all the subpopulations and variables in the call to
-svymean- try

or

Then take the square root of each variance

local cols = colsof(var)
matrix sd = J(1,`cols',0)
forval i = 1/`cals' {
matrix sd[1,`i'] = sqrt(var[1,`i'])
}
matrix list sd

---

Stata 9:

There are two differences for Stata 9 in the above discussion.

1.  -svy: mean- has an -over()- option in place of the -by()- option of
-svymean-.

2.  Although -svy- does not have an -srssubpop- option, -svy: mean- stores the
interested in standard deviation estimates assuming SRS sampling within the
specified subpopulations the formulas are

Without -fpc()- in the first stage:

With -fpc()- in the first stage: