Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Comparing multiple means with survey data--revisited

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: Comparing multiple means with survey data--revisited Date Wed, 30 May 2012 12:05:34 -0400

```Rieza:

The means are negative and so don't appear to be "ordinary" descriptive
statistics. Only you can say whether the purpose of the table is descriptive of a population (so that tests are not appropriate) or whether some causal hypothesis is in play (eg. "that such-and-such an intervention will show stronger effects for higher levels of variable B and for variable A").

The patterns are very clear:
1. Means increase with row number.
2. In each row, first column means are higher than third column means.

Confidence intervals for differences are okay for descriptive tables, but even if
there is a hypothesis floating around, such intervals would just confuse things here.
There would be a minimum of 15 if you did separate tests in each row, and 105
if all pairwise comparisons in the table are considered. Note that your -test-
statement tested equality of three means, not of two.

I do suggest that you add standard errors to the table.

Some alternative code:

*********************
// convert variable names to lower case for easier typing
rename VARIABLE_*, lower

svy: mean variable_a, over(variable_b variable_c)
*********************

For easier copying, you can get the columns of the table with the
following code.

*************************************************************
levelsof variable_b, local(levels)
foreach x of local(levels) {
di "variable_b = `x'"
svy, subpop(if variable_b==`x'): mean variable_a, over(variable_c)
*****************************************************

For correct standard errors, use the -subpop- option to subset data, not the -if- qualifier.

Steve
sjsamuels@gmail.com

On May 29, 2012, at 11:37 PM, Rieza Soelaeman wrote:

Dear Stata-Lers,
difference between means in survey data (that is, you can't/shouldn't do
this, I have copied the thread at the end of this e-mail).  I am trying to
replicate the work of a colleague who left recently.  She created a table
where the rows represent levels of one variable, columns represent the
levels of another variable, and the cells contain the mean value of a third
variable for that row/column combination and the number of people in that
group.

Example:

In cells: Mean of Variable A (n)

-----------------------------------------------------------------------------------------------------
Variable B (years)
-----------------------------------------------------------------------------------------------------
Variable C
(months)    5-10         11-15          16-20          Total            p-value
-----------------------------------------------------------------------------------------------------
0-9        -1.28 (21)    -0.57 (60)    -0.36 (75)    -0.57 (156)     0.032
10-18    -1.44 (30)    -0.92 (47)    -1.00 (54)    -1.07 (132)     0.15
19-27    -1.95 (64)    -1.68 (77)    -1.63 (126)  -1.72 (268)     0.314
28-36    -1.92 (51)    -1.83 (52)    -1.72 (104)  -1.80 (206)     0.652
37-45    -1.96 (36)    -2.01 (61)    -1.65 (54)    -1.87 (151)     0.107
-----------------------------------------------------------------------------------------------------

Usng -svyset-, I was able to get the same means and ns in each cell, but was
not able to get the same significance level for the difference between the
means--she used SPSS to get the p-values.  I suspect this is because I
specified the cluster, stratum, and pweights in my -svyset- command, whereas
the software she used allowed only for the specification of weights (to
specify a complex sampling design in SPSS requires an extension that costs

For those who are familiar with SPSS, she used the following syntax after
applying weights, and subsetting for a specific level of VARIABLE_C:

MEANS TABLES= VARIABLE_A BY VARIABLE_B
/CELLS MEAN COUNT STDDEV
/STATISTICS ANOVA.

I believe the equivalent in Stata to get the means and p-values is to use
the following code, but as Steve pointed out in the conversation copied
below from 2009, this is not theoretically correct:

. svy: mean VARIABLE_A if (VARIABLE_C==4), over(VARIABLE_B)

. test [VARIABLE_A]_subpop_1 = [VARIABLE_A]_subpop_2 = [VARIABLE_A]_subpop_3

My question is whether I should be attempting to compare the means using the
-svyset-/-test- commands at all (is what I am trying to do
legitimate), or if I should omit this comparison from my tables?

Thanks,
Rieza

-----------------------------------------------------------------------------------------------------

Re: st: comparing multiple means with survey data

________________________________
```