Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: subpop and the mysterious sample size


From   [email protected] (Jeff Pitblado, StataCorp LP)
To   [email protected]
Subject   Re: st: subpop and the mysterious sample size
Date   Fri, 19 May 2006 11:01:53 -0500

ROBERT BOZICK <[email protected]> is interested in computing the
subpopulation standard deviations for one or more subpops:

> The reason for needing the correct sample size is that I need to compute the
> population standard deviation.  I have been using the command suggested in
> http://www.stata.com/support/faqs/stat/supweight.html
> 
> The post estimation command after I use the svymean is:
> di sqrt(e(N) * el(e(V_srs),1,1))
> 
> Additionally, I will be using some by commands as well.    For example:
> 
> svymean var1, subpop(samp) by(sex)
> 
> and then I want to compute the population standard deviation for var1 for
> both categories of sex.  Without the sample size, I cannot get the correct
> standard deviation using the suggested post estimation command.

Stata 8:

In the FAQ, subpopulations are not really mentioned, but we can use the
discussion about estimating the population standard deviation to derive how we
would estimate the subpopulation standard deviation.

When you specify the -subpop()- and/or -by()- options to -svymean-, the
subpopulation sample sizes are stored in e(_N).  Thus the formula becomes

	sqrt(el(e(_N),1,1) * el(e(V_srs),1,1))

Although not mentioned in the FAQ, this formula only applies if you have not
-svyset- using the -fpc()- option.  If you -svyset- using the -fpc()- option,
then the formula is

	sqrt(el(e(_N),1,1) * el(e(V_srswr),1,1))

In the case where you would use the -srssubpop- option for looking at the DEFF
and DEFT design effects, the above formulas apply; just note that you will get
a different e(V_srs) and e(V_srswr) when the -srssubpop- option of -svymean-
is specified than when it isn't.

The above formulas only apply to the first subpop of the first variable.  To
get a row vector for all the subpopulations and variables in the call to
-svymean- try

	matrix var = hadamard(e(_N), vecdiag(e(V_srs)))

or

	matrix var = hadamard(e(_N), vecdiag(e(V_srswr)))

Then take the square root of each variance

	local cols = colsof(var)
	matrix sd = J(1,`cols',0)
	forval i = 1/`cals' {
	matrix sd[1,`i'] = sqrt(var[1,`i'])
	}
	matrix list sd

---

Stata 9:

There are two differences for Stata 9 in the above discussion.

1.  -svy: mean- has an -over()- option in place of the -by()- option of
-svymean-.

2.  Although -svy- does not have an -srssubpop- option, -svy: mean- stores the
'srssubpop' standard errors in -e(V_srssub)- and -e(V_srssubwr)-; so for those
interested in standard deviation estimates assuming SRS sampling within the
specified subpopulations the formulas are

Without -fpc()- in the first stage:

	matrix var = hadamard(e(_N), vecdiag(e(V_srssub)))

With -fpc()- in the first stage:

	matrix var = hadamard(e(_N), vecdiag(e(V_srssubwr)))

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index