Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Number of Obs with svy , suppop()

From   Phil Schumm <>
Subject   Re: st: Number of Obs with svy , suppop()
Date   Fri, 19 Mar 2010 14:42:34 -0500

On Mar 19, 2010, at 3:17 AM, Michael Norman Mitchell wrote:
Thank you for your reply... I am still struggling to solidly understand this. Perhaps I have a more fundamental question. What is the formula for the "Number of obs" in the context of the -svy- commands. It sounds like, in the absence of the -subpop()- option, it is the number of observations with non-missing values on the tabulated variable. And, in the presence of the -subpop()- option it is the total number of observations minus the number of observations that meet the -subpop()- option and are missing on the tabulated variable. Am I on the right track here?

Yes, I believe this is correct (note however that I haven't looked into this carefully, so if you need confirmation of Stata's behavior WRT this issue, you'll need to get it from the manual or from someone like Jeff). One more thing I should mention: How you proceed in cases like this may depend on the reason(s) that the data are missing. For example, suppose the missing values for race are due to respondents refusing to answer the question or saying "I don't know." In that case, Durbin argued that this should be taken into account when defining the subpopulation (also referred to in the survey literature as a domain).[1] IOW, in your example, the subpopulation of interest would be "all males who, when asked, will provide an answer to this question." In this case, you would augment your -subpop()- specification like this:

    svy, subpop(if sex==1 & !mi(race)):

in which case the "number of observations" reported by Stata should now correspond to the total number of observations in your dataset. More importantly, this would specify a slightly different variance calculation, though the actual result may only differ very slightly (if at all) depending on the circumstances. Note that I almost never see anyone do this -- at least not in the applied social science literature.

Of course, what I just described does nothing to address the possible bias that might arise if those who don't respond differ (in terms of race) from those who do...

-- Phil

[1] J. Durbin. Sampling theory for estimates based on fewer individuals than the number selected. Bulletin of the International Statistical Institute, 36(3):113–119, 1958.

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index