Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Number of Obs with svy , suppop()

From   Stas Kolenikov <>
Subject   Re: st: Number of Obs with svy , suppop()
Date   Fri, 19 Mar 2010 15:20:15 -0500

On Fri, Mar 19, 2010 at 3:17 AM, Michael Norman Mitchell  wrote:
> Dear Phil
> Thank you for your reply... I am still struggling to solidly understand
> this. Perhaps I have a more fundamental question. What is the formula for
> the "Number of obs" in the context of the -svy- commands. It sounds like, in
> the absence of the -subpop()- option, it is the number of observations with
> non-missing values on the tabulated variable. And, in the presence of the
> -subpop()- option it is the total number of observations minus the number of
> observations that meet the -subpop()- option and are missing on the
> tabulated variable. Am I on the right track here?

This is a complicated interplay between -markout-s of the survey
design variables, survey subpopulation, and that of the very command
to be called. I guess in this case what happened was:

1. -tab- marked out observations for which either race or gender were
missing, resulting in 4000 observations.

2. next, -subpop- marked out the observations with sex==1.

3. Finally, -svy- looked at these markings, and decided that the total
# of observations must be the number used in estimation in the
subpopulation (which turns out to be the intersection of what -tab-
and -subpop- has identified as relevant observations, 1904 males with
non-missing race), plus the number that was not marked out by either
command (2133 females, regardless of their race variable value). That
meant all individuals with sex==2, including those with missing race

Frankly, I don't know what the "correct" behavior should be. I guess
it is extremely difficult for a prefix command like -svy- to figure
out what's going on within the prefixed command (like -tab-). The
biggest culprit was -tab- which carelessly excluded some observations
from its -e(sample)- and did not know that -svy- would need to count
all these extra observations that -tab- dropped (essentially). What
Phil gave with an "extended" subpop specification is certainly a good
working solution, but it demands substantial discipline from the
user/analyst. It also explicitly says that the part of the population
to whom the result can be generalized are the people who do not hide
their race.

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index