[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"聲gel Rodr璲uez Laso" <angelrlaso@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Chi square test unavailable when subpop is used in svy analyisis |

Date |
Wed, 20 Aug 2008 12:48:45 +0200 |

If I've understood well, the exclusion of individuals with missing values from the calculation of variance estimates in options 1 and 3 is like for example dropping men when you need estimates only for women (using 'if' instead of 'subpop') what is incorrect in survey analysis. Therefore, when calculating variances for valid values in a survey variable (the rest of the values being missing either because the respondent didn't answer, didn愒 know the answer or the question was not applicable) option 2 (subpop(valid values)) should be used, because it takes into account all individuals (with valid and missing values) in the calculation. Is that correct? Many thanks. 聲gel Rodr璲uez Laso 2008/8/20, Jeff Pitblado, StataCorp LP <jpitblado@stata.com>: > 聲gel Rodr璲uez Laso <angelrlaso@gmail.com> has a follow-up question regarding > -svy: tabulate- with the -subpop()- option: > > > Following with this, I have a query: If there are missing values in a > > variable and SEs and CIs for the valid values are wanted, how should > > one proceed? Are individuals with missing values dropped from the > > calculations of SEs if subpop is not used? I see four possibilities: > > > > 1) svy:tab variable */intuitive option > > > > 2) svy, subpop (valid values): tab variable */probably most accurate > > > > 3) svy if variable==valid values: tab variable */not recommended for svy > > > > 4) svy: tab variable, missing */ but then you don愒 get proportions of > > valid values after excluding missing values > > > > In an example with a dichotomous variable with 5.7% missing values, I > > get exactly (up to three decimal figures) the same SEs, CIs and number > > of observations (n=11500, degrees of freedom=1255) with options 1, 2 > > and 3, and slightly smaller SEs with option 4 (n=12190, df=1255). > > In reviewing 聲gel's results, we noticed that -svy: tabulate- is incorrectly > dropping out-of-subpop observations that contain missing values in the > variables of the varlist (Option 2 should be different from options 1 and 3). > This affects the variance values when primary sampling units are are dropped > because of missing values and could decrease the design degrees of freedom. > Both of these effects are very slight and inversely related to the number of > PSUs. We will correct this in the next Stata ado-file update. > > In light of this, we'll address 聲gel's observations using -svy: proportion-, > which is very similar to -svy: tabulate- and correctly deals with missing > values in out-of-subpop observations. > > In the following we assume that the only variable with missing values is the > one we are tabulating. Here is a simple example that illustrates the > differences among the 4 options delineated by 聲gel. > > . sysuse auto > . svyset _n > . * 1 > . svy: prop rep > . est store noopts > . * 2 > . gen valid = !missing(rep) > . svy, subpop(valid): prop rep > . est store subpop > . * 3 > . svy: prop rep if valid > . est store withif > . * 4 > . svy: prop rep, missing > . est store missing > . est table _all, b se > > ***** BEGIN: final output from above illustrative example > . est table _all, b se > > ------------------------------------------------------------------ > Variable | noopts subpop withif missing > -------------+---------------------------------------------------- > 1 | .02898551 .02898551 .02898551 .02702703 > | .02034459 .02033449 .02034459 .01897965 > 2 | .11594203 .11594203 .11594203 .10810811 > | .03882454 .03880527 .03882454 .03634325 > 3 | .43478261 .43478261 .43478261 .40540541 > | .0601159 .06008606 .0601159 .05746373 > 4 | .26086957 .26086957 .26086957 .24324324 > | .05324978 .05322334 .05324978 .05021542 > 5 | .15942029 .15942029 .15942029 .14864865 > | .04439221 .04437017 .04439221 .04163643 > _prop_6 | .06756757 > | .02937761 > ------------------------------------------------------------------ > legend: b/se > ***** END: > > Summary of options (illustrated by above example using the auto data): > > - Options 1 (noopts) and 3 (withif) are equivalent. Stata's -svy- commands > drop within-subpop observations containing missing values. In this case, > the "subpop" is the entire population, and option 3 merely explicitly > excludes the observations that option 1 dropped because of missing values. > > - Option 2 (subpop) differs by treating the observations where the tabulated > variables contain missing values as out-of-subpop. Thus we are defining the > subpop as the collection of individuals in the population for which we are > able to collect information on the tabulated variable. While this results > in the same point estimates for any survey design, the variance estimates > can vary depending upon the number of PSU that are dropped by options 1 and > 3. > > - Option 4 (missing) merely treats the missing values as a separate category, > potentially biasing the point estimates and standard errors downward (toward > zero). The -missing- option should only be used in cases where the missing > values mean something like "not applicable" rather than "we couldn't get a > value from the survey participant". > > The option to choose is largely dependent on the reason for missing values in > the data. > > --Jeff > jpitblado@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Chi square test unavailable when subpop is used in svy analyisis***From:*jpitblado@stata.com (Jeff Pitblado, StataCorp LP)

- Prev by Date:
**Re: st: Multicollinearity and Orthogonalization** - Next by Date:
**st: Different p-values in same model (comparing mim and estimates tab)** - Previous by thread:
**Re: st: Chi square test unavailable when subpop is used in svy analyisis** - Next by thread:
**st: xtabond2: Sargan test** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |