# Re: st: Chi square test unavailable when subpop is used in svy analyisis

 From "聲gel Rodr璲uez Laso" To statalist@hsphsun2.harvard.edu Subject Re: st: Chi square test unavailable when subpop is used in svy analyisis Date Tue, 19 Aug 2008 13:22:46 +0200

```Thank you for your advice.

Following with this, I have a query: If there are missing values in a
variable and SEs and CIs for the valid values are wanted, how should
one proceed? Are individuals with missing values dropped from the
calculations of SEs if subpop is not used? I see four possibilities:

1) svy:tab variable */intuitive option

2) svy, subpop (valid values): tab variable */probably most accurate

3) svy if variable==valid values: tab variable */not recommended for svy

4) svy: tab variable, missing */ but then you don愒 get proportions of
valid values after excluding missing values

In an example with a dichotomous variable with 5.7% missing values, I
get exactly (up to three decimal figures) the same SEs, CIs and number
of observations (n=11500, degrees of freedom=1255) with options 1, 2
and 3, and slightly smaller SEs with option 4 (n=12190, df=1255).

Best regards,

> 聲gel Rodr璲uez Laso angelrlaso@gmail.com> is using -svy: tabulate- with a
> -subpop()- option that conditions out a row of the two-way table.  This causes
> a zero in the corresponding row margin, preventing the computation of certain
> test statistics.  The question is how to get Stata to produce a Pearson
> statistic in this case.
>
> > I'm working with Stata 9.2.
> >
> > I'm interested in obtaining the corrected chi square test for a
> > distribution of two variables from a survey ('p108_n' and 'sexo'), but
> > limiting the analysis to a selected group of values  of p108_n (1 to
> > 5). I've used the subpop command with the following result:
> >
> >
> >  svy, subpop(if p108_n<10):tab p108_n sexo , count nolabel format(%11.1f)
> > (running tabulate on estimation sample)
> >
> > Number of strata   =        11                  Number of obs      =     12190
> > Number of PSUs     =      1266                  Population size    = 12189,962
> >                                                 Subpop. no. of obs =     11733
> >                                                 Subpop. size       = 11834,102
> >                                                 Design df          =      1255
> >
> > ----------------------------------
> >           |          sexo
> >    p108_n |      1       2   Total
> > ----------+-----------------------
> >         1 |  638,8   708,1  1346,9
> >         2 |  581,3   726,2  1307,5
> >         3 | 1968,3  2144,6  4112,9
> >         4 | 2215,6  1926,3  4141,9
> >         5 |  404,2   520,7   924,9
> >        10 |    0,0     0,0     0,0
> >           |
> >     Total | 5808,2  6025,9  11834,1
> > ----------------------------------
> >   Key:  weighted counts
> >
> >   Table contains a zero in the marginals.
> >   Statistics cannot be computed.
> >
> >
> > Is there any way to get the chi square test I need without deleting
> > the p108_n==10 individuals, I mean, keeping them for the calculation
> > of the standard errors?
>
> 聲gel can use the -se- option to see how changing the -subpop()- option into
> an -if- condition will affect the standard error estimates (SEs).   Using the
> above example, 聲gel can run the following two commands and compare the
> resulting SEs.
>
>        . svy, subpop(if p108_n<10) : tab p108_n sexo , count nolabel
>
>        . svy if p108_n<10 : tab p108_n sexo , count nolabel
>
> We suspect that the reported SE values will be very similar.  In that case, we
> would propose that the Pearson statistic reported by the second command is
> reasonable.
>
> In general, we strongly encourage survey data analysts to use -subpop()-
> instead of the -if- in order to obtain the proper subpop SEs; however, this is
> one case where using -if- should result in essentially similar SEs without
> preventing the computation of a reasonable test statistic.
>
>        Cautionary note:  The smaller the subpop sample size relative to the
>        overall sample size, the more likely the SE values will differ between
>        the two methods.
>
> --Jeff
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```