[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
jpitblado@stata.com (Jeff Pitblado, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Chi square test unavailable when subpop is used in svy analyisis |

Date |
Tue, 19 Aug 2008 17:57:03 -0500 |

聲gel Rodr璲uez Laso <angelrlaso@gmail.com> has a follow-up question regarding -svy: tabulate- with the -subpop()- option: > Following with this, I have a query: If there are missing values in a > variable and SEs and CIs for the valid values are wanted, how should > one proceed? Are individuals with missing values dropped from the > calculations of SEs if subpop is not used? I see four possibilities: > > 1) svy:tab variable */intuitive option > > 2) svy, subpop (valid values): tab variable */probably most accurate > > 3) svy if variable==valid values: tab variable */not recommended for svy > > 4) svy: tab variable, missing */ but then you don愒 get proportions of > valid values after excluding missing values > > In an example with a dichotomous variable with 5.7% missing values, I > get exactly (up to three decimal figures) the same SEs, CIs and number > of observations (n=11500, degrees of freedom=1255) with options 1, 2 > and 3, and slightly smaller SEs with option 4 (n=12190, df=1255). In reviewing 聲gel's results, we noticed that -svy: tabulate- is incorrectly dropping out-of-subpop observations that contain missing values in the variables of the varlist (Option 2 should be different from options 1 and 3). This affects the variance values when primary sampling units are are dropped because of missing values and could decrease the design degrees of freedom. Both of these effects are very slight and inversely related to the number of PSUs. We will correct this in the next Stata ado-file update. In light of this, we'll address 聲gel's observations using -svy: proportion-, which is very similar to -svy: tabulate- and correctly deals with missing values in out-of-subpop observations. In the following we assume that the only variable with missing values is the one we are tabulating. Here is a simple example that illustrates the differences among the 4 options delineated by 聲gel. . sysuse auto . svyset _n . * 1 . svy: prop rep . est store noopts . * 2 . gen valid = !missing(rep) . svy, subpop(valid): prop rep . est store subpop . * 3 . svy: prop rep if valid . est store withif . * 4 . svy: prop rep, missing . est store missing . est table _all, b se ***** BEGIN: final output from above illustrative example . est table _all, b se ------------------------------------------------------------------ Variable | noopts subpop withif missing -------------+---------------------------------------------------- 1 | .02898551 .02898551 .02898551 .02702703 | .02034459 .02033449 .02034459 .01897965 2 | .11594203 .11594203 .11594203 .10810811 | .03882454 .03880527 .03882454 .03634325 3 | .43478261 .43478261 .43478261 .40540541 | .0601159 .06008606 .0601159 .05746373 4 | .26086957 .26086957 .26086957 .24324324 | .05324978 .05322334 .05324978 .05021542 5 | .15942029 .15942029 .15942029 .14864865 | .04439221 .04437017 .04439221 .04163643 _prop_6 | .06756757 | .02937761 ------------------------------------------------------------------ legend: b/se ***** END: Summary of options (illustrated by above example using the auto data): - Options 1 (noopts) and 3 (withif) are equivalent. Stata's -svy- commands drop within-subpop observations containing missing values. In this case, the "subpop" is the entire population, and option 3 merely explicitly excludes the observations that option 1 dropped because of missing values. - Option 2 (subpop) differs by treating the observations where the tabulated variables contain missing values as out-of-subpop. Thus we are defining the subpop as the collection of individuals in the population for which we are able to collect information on the tabulated variable. While this results in the same point estimates for any survey design, the variance estimates can vary depending upon the number of PSU that are dropped by options 1 and 3. - Option 4 (missing) merely treats the missing values as a separate category, potentially biasing the point estimates and standard errors downward (toward zero). The -missing- option should only be used in cases where the missing values mean something like "not applicable" rather than "we couldn't get a value from the survey participant". The option to choose is largely dependent on the reason for missing values in the data. --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Chi square test unavailable when subpop is used in svy analyisis***From:*"聲gel Rodr璲uez Laso" <angelrlaso@gmail.com>

- Prev by Date:
**st: proc mixed translation** - Next by Date:
**st: graph command: xline() with by() option** - Previous by thread:
**Re: st: Chi square test unavailable when subpop is used in svy analyisis** - Next by thread:
**Re: st: Chi square test unavailable when subpop is used in svy analyisis** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |