Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Michael Norman Mitchell <Michael.Norman.Mitchell@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Number of Obs with svy , suppop() |

Date |
Fri, 19 Mar 2010 18:28:05 -0700 |

Dear Phil and Stas

Many thanks, Michael N. Mitchell See the Stata tidbit of the week at... http://www.MichaelNormanMitchell.com On 2010-03-19 1.20 PM, Stas Kolenikov wrote:

On Fri, Mar 19, 2010 at 3:17 AM, Michael Norman Mitchell wrote:Dear Phil Thank you for your reply... I am still struggling to solidly understand this. Perhaps I have a more fundamental question. What is the formula for the "Number of obs" in the context of the -svy- commands. It sounds like, in the absence of the -subpop()- option, it is the number of observations with non-missing values on the tabulated variable. And, in the presence of the -subpop()- option it is the total number of observations minus the number of observations that meet the -subpop()- option and are missing on the tabulated variable. Am I on the right track here?This is a complicated interplay between -markout-s of the survey design variables, survey subpopulation, and that of the very command to be called. I guess in this case what happened was: 1. -tab- marked out observations for which either race or gender were missing, resulting in 4000 observations. 2. next, -subpop- marked out the observations with sex==1. 3. Finally, -svy- looked at these markings, and decided that the total # of observations must be the number used in estimation in the subpopulation (which turns out to be the intersection of what -tab- and -subpop- has identified as relevant observations, 1904 males with non-missing race), plus the number that was not marked out by either command (2133 females, regardless of their race variable value). That meant all individuals with sex==2, including those with missing race information. Frankly, I don't know what the "correct" behavior should be. I guess it is extremely difficult for a prefix command like -svy- to figure out what's going on within the prefixed command (like -tab-). The biggest culprit was -tab- which carelessly excluded some observations from its -e(sample)- and did not know that -svy- would need to count all these extra observations that -tab- dropped (essentially). What Phil gave with an "extended" subpop specification is certainly a good working solution, but it demands substantial discipline from the user/analyst. It also explicitly says that the part of the population to whom the result can be generalized are the people who do not hide their race.

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Number of Obs with svy , suppop()***From:*Michael Mitchell <Michael.Norman.Mitchell@gmail.com>

**Re: st: Number of Obs with svy , suppop()***From:*Phil Schumm <pschumm@uchicago.edu>

**Re: st: Number of Obs with svy , suppop()***From:*Michael Norman Mitchell <Michael.Norman.Mitchell@gmail.com>

**Re: st: Number of Obs with svy , suppop()***From:*Stas Kolenikov <skolenik@gmail.com>

- Prev by Date:
**st: re: newey, F** - Next by Date:
**st: stata on the mac crash** - Previous by thread:
**Re: st: Number of Obs with svy , suppop()** - Next by thread:
**st: Logit error** - Index(es):