Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Afia Tasneem <afiata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: should estat sd reports same sd before and after clustering? |
Date | Sun, 28 Jul 2013 18:03:48 -0400 |
Hi Steve, I am confused. To be clear, sd's are not supposed to change with clustering, correct? se's are supposed to change with clustering. In a table reporting mean, sd of classes for males and females, the difference between the two, se and p-value of the difference, where the cluster design of the experiment is taken into account for all numbers, what's the correct method to use (option 1 or 2 below): Option 1: Numbers using the following code: svyset branch svy: mean `var', over(intervention) estat sd lincom [`var']intervention - [`var']control or Option 2: clttest `var', cluster(branch) by(intervention) Many thanks, Afia On Sun, Jul 28, 2013 at 5:05 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > > Afia: > > ------------------------------------------------------------------------ > Intra-cluster correlation = 0.0465 > ------------------------------------------------------------------------ > N Clusts Mean SE 95 % CI > intervention=0 38011 7.4342 0.2763 [ 6.8186, 8.0498] > intervention=1 34514 6.9507 0.2768 [ 6.3527, 7.5488] > ------------------------------------------------------------------------ > > > r(sd_1) r(sd_2) estimate the SDs that would give the same SE's if there were > no clustering > > sd1 = n1^.5 x se1 > sd2 = n2^.5 x se2 > > sd1 = (380)^.5 x .2763 > sd2 = (345(^.5 x .2768 > > r(sd_2) = 5.141886711364611 > r(sd_1) = 5.385836699859183 > > Steve > > > On Jul 28, 2013, at 3:45 PM, Afia Tasneem wrote: > > Dear Steve, > > Thank you for your reply. And apologies for not posting the code; I > am new to statalist. > > I would be grateful if you could also answer a few follow up questions: > > As you can see from the code below, standard errors with and without > clustering using svyset are almost the same (any reason for the super > slight difference?): 3.168354 and 2.756693 with clustering and > 3.170342 and 2.758793 for control and intervention groups respectively > without clustering. However, the command clttest gives me different > sds before and after clustering: with clttests, my sds are 5.385 and > 5.141 for control and intervention groups respectively whereas in > normal ttests, the sds are 3.170342 and 2.758793. Why do I get > different sds with svyset plus estat and clttest? > > below is the code: > > . svyset branch > > pweight: <none> > VCE: linearized > Single unit: missing > Strata 1: <one> > SU 1: branch > FPC 1: <zero> > > . svy: mean class, over(intervention) > (running mean on estimation sample) > > Survey: Mean estimation > > Number of strata = 1 Number of obs = 725 > Number of PSUs = 25 Population size = 725 > Design df = 24 > > control: intervention = control > intervention: intervention = intervention > > -------------------------------------------------------------- > | Linearized > Over | Mean Std. Err. [95% Conf. Interval] > -------------+------------------------------------------------ > class | > control | 7.434211 .3031807 6.808476 8.059945 > intervention | 6.950725 .2003743 6.537172 7.364277 > -------------------------------------------------------------- > > . estat sd > > control: intervention = control > intervention: intervention = intervention > > ------------------------------------- > Over | Mean Std. Dev. > -------------+----------------------- > class | > control | 7.434211 3.168354 > intervention | 6.950725 2.756693 > ------------------------------------- > > . bysort intervention: sum class > > ------------------------------------------------------------------------------------------------------------------------------------------- > -> intervention = control > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > class | 380 7.434211 3.170342 0 12 > > ------------------------------------------------------------------------------------------------------------------------------------------- > -> intervention = intervention > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > class | 345 6.950725 2.758793 0 12 > > However, when I use the command "clttest," my standard deviations do > change with clustering: > > with clttests, my sds are 5.385 and 5.141 for control and intervention > groups respectively whereas in normal ttests, the sds are 3.170342 and > 2.758793 for control and intervention groups respectively. > > . clttest class, cluster(branch) by(intervention) > > t-test adjusted for clustering > class by intervention, clustered by branch > ------------------------------------------------------------------------ > Intra-cluster correlation = 0.0465 > ------------------------------------------------------------------------ > N Clusts Mean SE 95 % CI > intervention=0 38011 7.4342 0.2763 [ 6.8186, 8.0498] > intervention=1 34514 6.9507 0.2768 [ 6.3527, 7.5488] > ------------------------------------------------------------------------ > Combined 725 14 7.2041 0.1957 [ 6.7992, 7.6091] > ------------------------------------------------------------------------ > Diff(0-1) 725 25 0.4835 0.3911 [ -0.3256, 1.2926] > > Degrees freedom: 23 > > Ho: mean(-) = mean(diff) = 0 > > Ha: mean(diff) < 0 Ha: mean(diff) ~= 0 Ha: mean(diff) > 0 > t = 1.2362 t = 1.2362 t = 1.2362 > P < t = 0.8856 P > |t| = 0.2289 P > t = 0.1144 > > . return list > > scalars: > r(N_2) = 345 > r(N_1) = 380 > r(df_t) = 23 > r(t) = 1.2362 > r(sd_2) = 5.141886711364611 > r(sd_1) = 5.385836699859183 > r(se) = .3911133002996737 > r(m_diff) = .4834856986999512 > r(se_2) = .2768298747832084 > r(se_1) = .2762875930960634 > r(mu_2) = 6.950724601745606 > r(mu_1) = 7.434210300445557 > r(p_l) = .8855657157257124 > r(p_u) = .1144342842742876 > r(p) = .2288685685485752 > > . ttest class, by(intervention) > > Two-sample t test with equal variances > ------------------------------------------------------------------------------ > Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] > ---------+-------------------------------------------------------------------- > control | 380 7.434211 .1626351 3.170342 7.11443 7.753991 > interven | 345 6.950725 .1485284 2.758793 6.658586 7.242863 > ---------+-------------------------------------------------------------------- > combined | 725 7.204138 .1110214 2.989343 6.986176 7.4221 > ---------+-------------------------------------------------------------------- > diff | .4834859 .2217278 .0481787 .9187931 > ------------------------------------------------------------------------------ > diff = mean(control) - mean(interven) t = 2.1805 > Ho: diff = 0 degrees of freedom = 723 > > Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 > Pr(T < t) = 0.9852 Pr(|T| > |t|) = 0.0295 Pr(T > t) = 0.0148 > > Very grateful for your help. > > Best regards, > Afia > > > > > On Fri, Jul 26, 2013 at 5:25 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> >> The Statalist FAQ request that you show both your code and results. As >> you didn't, we have little idea of what you saw. I guess that your >> -svyset- didn't specify a probability weight. >> >> In that case, observations are equally weighted, and the estimated >> population standard deviation *and* mean must be identical to the sample >> versions, as given by -summarize-. Clustering, as you, noticed affects >> only standard errors. The following shows that the sd and mean are >> affected only by weighting and not by clustering. >> >> >> . sysuse auto, clear >> . gen mkr = substr(make,1,2) >> >> . svyset mkr >> . svy: mean turn >> . estat sd >> . sum turn >> >> . svyset mkr [pw = price] >> . svy: mean turn >> . estat sd >> . sum turn [aw = price] >> >> Steve >> >> On Jul 26, 2013, at 12:25 PM, Afia Tasneem wrote: >> >> Dear all, >> >> I am working on the analysis of a clustered randomized trial. >> >> My standard errors change when I svyset the data to account for >> clustering. However, the standard deviations after clustering with >> svyset and using estat sd is the same as before clustering (also the >> same as simply using: sum var). Should the sd remain unaffected with >> changes in se due to clustering? Or is the command "estat sd" not the >> right one to use to find standard deviations after clustering? >> >> Thanks much, >> Afia >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/