Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Comparing multiple means with survey data--revisited |

Date |
Wed, 30 May 2012 12:25:32 -0400 |

Correction: "foreach x of local levels {" not "foreach x of local(levels) {" On May 30, 2012, at 12:05 PM, Steve Samuels wrote: Rieza: The means are negative and so don't appear to be "ordinary" descriptive statistics. Only you can say whether the purpose of the table is descriptive of a population (so that tests are not appropriate) or whether some causal hypothesis is in play (eg. "that such-and-such an intervention will show stronger effects for higher levels of variable B and for variable A"). The patterns are very clear: 1. Means increase with row number. 2. In each row, first column means are higher than third column means. Confidence intervals for differences are okay for descriptive tables, but even if there is a hypothesis floating around, such intervals would just confuse things here. There would be a minimum of 15 if you did separate tests in each row, and 105 if all pairwise comparisons in the table are considered. Note that your -test- statement tested equality of three means, not of two. I do suggest that you add standard errors to the table. Some alternative code: ********************* // convert variable names to lower case for easier typing rename VARIABLE_*, lower svy: mean variable_a, over(variable_b variable_c) ********************* For easier copying, you can get the columns of the table with the following code. ************************************************************* levelsof variable_b, local(levels) foreach x of local levels { di "variable_b = `x'" svy, subpop(if variable_b==`x'): mean variable_a, over(variable_c) ***************************************************** For correct standard errors, use the -subpop- option to subset data, not the -if- qualifier. Steve sjsamuels@gmail.com On May 29, 2012, at 11:37 PM, Rieza Soelaeman wrote: Dear Stata-Lers, I need your help in clarifying an earlier point made about testing the difference between means in survey data (that is, you can't/shouldn't do this, I have copied the thread at the end of this e-mail). I am trying to replicate the work of a colleague who left recently. She created a table where the rows represent levels of one variable, columns represent the levels of another variable, and the cells contain the mean value of a third variable for that row/column combination and the number of people in that group. Example: In cells: Mean of Variable A (n) ----------------------------------------------------------------------------------------------------- Variable B (years) ----------------------------------------------------------------------------------------------------- Variable C (months) 5-10 11-15 16-20 Total p-value ----------------------------------------------------------------------------------------------------- 0-9 -1.28 (21) -0.57 (60) -0.36 (75) -0.57 (156) 0.032 10-18 -1.44 (30) -0.92 (47) -1.00 (54) -1.07 (132) 0.15 19-27 -1.95 (64) -1.68 (77) -1.63 (126) -1.72 (268) 0.314 28-36 -1.92 (51) -1.83 (52) -1.72 (104) -1.80 (206) 0.652 37-45 -1.96 (36) -2.01 (61) -1.65 (54) -1.87 (151) 0.107 ----------------------------------------------------------------------------------------------------- Usng -svyset-, I was able to get the same means and ns in each cell, but was not able to get the same significance level for the difference between the means--she used SPSS to get the p-values. I suspect this is because I specified the cluster, stratum, and pweights in my -svyset- command, whereas the software she used allowed only for the specification of weights (to specify a complex sampling design in SPSS requires an extension that costs about $600). For those who are familiar with SPSS, she used the following syntax after applying weights, and subsetting for a specific level of VARIABLE_C: MEANS TABLES= VARIABLE_A BY VARIABLE_B /CELLS MEAN COUNT STDDEV /STATISTICS ANOVA. I believe the equivalent in Stata to get the means and p-values is to use the following code, but as Steve pointed out in the conversation copied below from 2009, this is not theoretically correct: . svy: mean VARIABLE_A if (VARIABLE_C==4), over(VARIABLE_B) . test [VARIABLE_A]_subpop_1 = [VARIABLE_A]_subpop_2 = [VARIABLE_A]_subpop_3 My question is whether I should be attempting to compare the means using the -svyset-/-test- commands at all (is what I am trying to do legitimate), or if I should omit this comparison from my tables? Thanks, Rieza ----------------------------------------------------------------------------------------------------- Re: st: comparing multiple means with survey data ________________________________

**Follow-Ups**:**Re: st: Comparing multiple means with survey data--revisited***From:*Rieza Soelaeman <rsoelaeman@gmail.com>

**References**:**st: Comparing multiple means with survey data--revisited***From:*Rieza Soelaeman <rsoelaeman@gmail.com>

**Re: st: Comparing multiple means with survey data--revisited***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**st: RE: Command accepts only global macros** - Next by Date:
**st: RE: Rankings according to percentiles** - Previous by thread:
**Re: st: Comparing multiple means with survey data--revisited** - Next by thread:
**Re: st: Comparing multiple means with survey data--revisited** - Index(es):