Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Comparing multiple means with survey data--revisited


From   Rieza Soelaeman <[email protected]>
To   [email protected]
Subject   st: Comparing multiple means with survey data--revisited
Date   Tue, 29 May 2012 22:37:28 -0500

Dear Stata-Lers,
I need your help in clarifying an earlier point made about testing the
difference between means in survey data (that is, you can't/shouldn't do
this, I have copied the thread at the end of this e-mail).  I am trying to
replicate the work of a colleague who left recently.  She created a table
where the rows represent levels of one variable, columns represent the
levels of another variable, and the cells contain the mean value of a third
variable for that row/column combination and the number of people in that
group.

Example:

In cells: Mean of Variable A (n)

-----------------------------------------------------------------------------------------------------
                                           Variable B (years)
-----------------------------------------------------------------------------------------------------
Variable C
(months)    5-10         11-15          16-20          Total            p-value
-----------------------------------------------------------------------------------------------------
0-9        -1.28 (21)    -0.57 (60)    -0.36 (75)    -0.57 (156)     0.032
10-18    -1.44 (30)    -0.92 (47)    -1.00 (54)    -1.07 (132)     0.15
19-27    -1.95 (64)    -1.68 (77)    -1.63 (126)  -1.72 (268)     0.314
28-36    -1.92 (51)    -1.83 (52)    -1.72 (104)  -1.80 (206)     0.652
37-45    -1.96 (36)    -2.01 (61)    -1.65 (54)    -1.87 (151)     0.107
-----------------------------------------------------------------------------------------------------

Usng -svyset-, I was able to get the same means and ns in each cell, but was
not able to get the same significance level for the difference between the
means--she used SPSS to get the p-values.  I suspect this is because I
specified the cluster, stratum, and pweights in my -svyset- command, whereas
the software she used allowed only for the specification of weights (to
specify a complex sampling design in SPSS requires an extension that costs
about $600).

For those who are familiar with SPSS, she used the following syntax after
applying weights, and subsetting for a specific level of VARIABLE_C:

MEANS TABLES= VARIABLE_A BY VARIABLE_B
/CELLS MEAN COUNT STDDEV
/STATISTICS ANOVA.

I believe the equivalent in Stata to get the means and p-values is to use
the following code, but as Steve pointed out in the conversation copied
below from 2009, this is not theoretically correct:

. svy: mean VARIABLE_A if (VARIABLE_C==4), over(VARIABLE_B)

. test [VARIABLE_A]_subpop_1 = [VARIABLE_A]_subpop_2 = [VARIABLE_A]_subpop_3

My question is whether I should be attempting to compare the means using the
-svyset-/-test- commands at all (is what I am trying to do
legitimate), or if I should omit this comparison from my tables?

Thanks,
Rieza

-----------------------------------------------------------------------------------------------------

Re: st: comparing multiple means with survey data

________________________________


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index