Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svy subpop option and e(sample)

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: svy subpop option and e(sample)
Date	Wed, 25 May 2011 13:58:08 -0400

Richard- 
For a large enough subpopulation, the correct standard error for the ratio is indistinguishable from the standard error that assumes that the sample size was fixed (Lohr, 2009, p. 135, shows the formula for a SRS). Below is an example. (If you are using fpcs, then the sampling fractions for sub-population and full population must also be close.)  So extract without guilt.

Steve
[email protected]

Ref: Lohr, Sharon L. 2009. Sampling: Design and Analysis. 2nd ed. Boston, MA: Cengage Brooks/Cole.
*******CODE BEGINS*****************
sysuse auto, clear
set seed 31497
gen u= uniform()
sort u
expand 7
gen psu = mod(_n,50)
replace mpg = mpg + 5*uniform()
svyset psu [pweight=turn]
svy: mean mpg if foreign==1
svy, subpop(foreign): mean mpg
*****CODE ENDS********************

On May 25, 2011, at 11:10 AM, Richard Williams wrote:

At 06:20 PM 5/24/2011, Steven Samuels wrote:
> Just to elaborate: with sub-populations, the ratio estimator of a mean with every sample member in numerator and denominator is necessary because the sample size of the subpopulation is random, not fixed. This extends to the regression estimators, as they are functions of means.  If you had use an -if- qualifier to restrict the analysis to black==1, e(sample) would work as you expect; the estimates would be the same; but the standard errors would be different.
> 
> Steve
> [email protected]

As a sidelight, one of the things that has always bothered me about subpop is that you are apparently never supposed to create an extract from your data, e.g. you could have 100 million cases and only be interested in a subpopulation of 10,000, but you are nonetheless supposed to keep all 100 million cases in your data set so the standard errors are right. I always wonder how horrible it would be if you just made the extract or used -if- instead of subpop. If, say, the standard errors might be off by .01%, I suspect I could live with that.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: svy subpop option and e(sample)
  - From: Hitesh Chandwani <[email protected]>

References:
- st: svy subpop option and e(sample)
  - From: Richard Williams <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Richard Williams <[email protected]>

Prev by Date: Re: st: reshaping long panel into wide to get rowtotals
Next by Date: Re: st: Stata crashes when loading a dataset
Previous by thread: Re: st: svy subpop option and e(sample)
Next by thread: Re: st: svy subpop option and e(sample)
Index(es):
- Date
- Thread