# Re: st: Chi square test unavailable when subpop is used in svy analyisis

 From [email protected] (Jeff Pitblado, StataCorp LP) To [email protected] Subject Re: st: Chi square test unavailable when subpop is used in svy analyisis Date Mon, 18 Aug 2008 12:25:51 -0500

```�ngel Rodr�guez Laso [email protected]> is using -svy: tabulate- with a
-subpop()- option that conditions out a row of the two-way table.  This causes
a zero in the corresponding row margin, preventing the computation of certain
test statistics.  The question is how to get Stata to produce a Pearson
statistic in this case.

> I'm working with Stata 9.2.
>
> I'm interested in obtaining the corrected chi square test for a
> distribution of two variables from a survey ('p108_n' and 'sexo'), but
> limiting the analysis to a selected group of values  of p108_n (1 to
> 5). I've used the subpop command with the following result:
>
>
>  svy, subpop(if p108_n<10):tab p108_n sexo , count nolabel format(%11.1f)
> (running tabulate on estimation sample)
>
> Number of strata   =        11                  Number of obs      =     12190
> Number of PSUs     =      1266                  Population size    = 12189,962
>                                                 Subpop. no. of obs =     11733
>                                                 Subpop. size       = 11834,102
>                                                 Design df          =      1255
>
> ----------------------------------
>           |          sexo
>    p108_n |      1       2   Total
> ----------+-----------------------
>         1 |  638,8   708,1  1346,9
>         2 |  581,3   726,2  1307,5
>         3 | 1968,3  2144,6  4112,9
>         4 | 2215,6  1926,3  4141,9
>         5 |  404,2   520,7   924,9
>        10 |    0,0     0,0     0,0
>           |
>     Total | 5808,2  6025,9  11834,1
> ----------------------------------
>   Key:  weighted counts
>
>   Table contains a zero in the marginals.
>   Statistics cannot be computed.
>
>
> Is there any way to get the chi square test I need without deleting
> the p108_n==10 individuals, I mean, keeping them for the calculation
> of the standard errors?

�ngel can use the -se- option to see how changing the -subpop()- option into
an -if- condition will affect the standard error estimates (SEs).   Using the
above example, �ngel can run the following two commands and compare the
resulting SEs.

. svy, subpop(if p108_n<10) : tab p108_n sexo , count nolabel

. svy if p108_n<10 : tab p108_n sexo , count nolabel

We suspect that the reported SE values will be very similar.  In that case, we
would propose that the Pearson statistic reported by the second command is
reasonable.

In general, we strongly encourage survey data analysts to use -subpop()-
instead of the -if- in order to obtain the proper subpop SEs; however, this is
one case where using -if- should result in essentially similar SEs without
preventing the computation of a reasonable test statistic.

Cautionary note:  The smaller the subpop sample size relative to the
overall sample size, the more likely the SE values will differ between
the two methods.

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```