Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Number of Obs with svy , suppop()

From	Michael Norman Mitchell <[email protected]>
To	[email protected]
Subject	Re: st: Number of Obs with svy , suppop()
Date	Fri, 19 Mar 2010 18:28:05 -0700

Dear Phil and Stas

Thank you very kindly for the extra information. Using my ordinaryintuition, I just could not fathom how imposing a restriction on thesample (via subpop()) could lead to an increase in the "number of obs".Your additional explanation helped quite a bit.

To me, this sounds like an excellent Stata FAQ. I am going to writeto Stata and make such a suggestion.


Many thanks,

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com

On 2010-03-19 1.20 PM, Stas Kolenikov wrote:

On Fri, Mar 19, 2010 at 3:17 AM, Michael Norman Mitchell  wrote:

Dear Phil

  Thank you for your reply... I am still struggling to solidly understand
this. Perhaps I have a more fundamental question. What is the formula for
the "Number of obs" in the context of the -svy- commands. It sounds like, in
the absence of the -subpop()- option, it is the number of observations with
non-missing values on the tabulated variable. And, in the presence of the
-subpop()- option it is the total number of observations minus the number of
observations that meet the -subpop()- option and are missing on the
tabulated variable. Am I on the right track here?

This is a complicated interplay between -markout-s of the survey
design variables, survey subpopulation, and that of the very command
to be called. I guess in this case what happened was:

1. -tab- marked out observations for which either race or gender were
missing, resulting in 4000 observations.

2. next, -subpop- marked out the observations with sex==1.

3. Finally, -svy- looked at these markings, and decided that the total
# of observations must be the number used in estimation in the
subpopulation (which turns out to be the intersection of what -tab-
and -subpop- has identified as relevant observations, 1904 males with
non-missing race), plus the number that was not marked out by either
command (2133 females, regardless of their race variable value). That
meant all individuals with sex==2, including those with missing race
information.

Frankly, I don't know what the "correct" behavior should be. I guess
it is extremely difficult for a prefix command like -svy- to figure
out what's going on within the prefixed command (like -tab-). The
biggest culprit was -tab- which carelessly excluded some observations
from its -e(sample)- and did not know that -svy- would need to count
all these extra observations that -tab- dropped (essentially). What
Phil gave with an "extended" subpop specification is certainly a good
working solution, but it demands substantial discipline from the
user/analyst. It also explicitly says that the part of the population
to whom the result can be generalized are the people who do not hide
their race.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Number of Obs with svy , suppop()
  - From: Michael Mitchell <[email protected]>
- Re: st: Number of Obs with svy , suppop()
  - From: Phil Schumm <[email protected]>
- Re: st: Number of Obs with svy , suppop()
  - From: Michael Norman Mitchell <[email protected]>
- Re: st: Number of Obs with svy , suppop()
  - From: Stas Kolenikov <[email protected]>

Prev by Date: st: re: newey, F
Next by Date: st: stata on the mac crash
Previous by thread: Re: st: Number of Obs with svy , suppop()
Next by thread: st: Logit error
Index(es):
- Date
- Thread