Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: should estat sd reports same sd before and after clustering?


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: should estat sd reports same sd before and after clustering?
Date   Sun, 28 Jul 2013 19:52:01 -0400

"To be clear, sd's are not supposed to change with clustering, correct?"
It depends on which ones. The sample & estimated population SDs do not
change. The SD* returned by -clttest- is not the sample SD. It satisfies
the equation:

SE = SD*/sqrt(n)

where the SE is from the *clustered* analysis. Some people find it useful for
study planning or for characterizing the effect of clustering.

You've added a question about which command should be used to
compare means. If you don't have survey data, why -svyset-?
There are non-survey options. including:

-mean-, with cluster() option, followed by -lincom- 
-reg- with  cluster() option
-clttest-


You don't appear to have compared any of these. Do so, and you'll
be able to answer your question yourself.

By the way, you are asked to give the source of contributed
commands like -clttest-.

Steve

On Jul 28, 2013, at 6:03 PM, Afia Tasneem wrote:

Hi Steve,

I am confused. To be clear, sd's are not supposed to change with
clustering, correct? se's are supposed to change with clustering.

In a table reporting mean, sd of classes for males and females, the
difference between the two, se and p-value of the difference, where
the cluster design of the experiment is taken into account for all
numbers, what's the correct method to use (option 1 or 2 below):

Option 1:
Numbers using the following code:
svyset branch
svy: mean `var', over(intervention)
estat sd
lincom [`var']intervention - [`var']control

or
Option 2:
clttest `var', cluster(branch) by(intervention)

Many thanks,
Afia


On Sun, Jul 28, 2013 at 5:05 PM, Steve Samuels <[email protected]> wrote:
> 
> Afia:
> 
> ------------------------------------------------------------------------
> Intra-cluster correlation         =           0.0465
> ------------------------------------------------------------------------
>             N    Clusts    Mean           SE             95 % CI
> intervention=0   38011      7.4342      0.2763       [  6.8186,  8.0498]
> intervention=1   34514      6.9507      0.2768       [  6.3527,  7.5488]
> ------------------------------------------------------------------------
> 
> 
> r(sd_1) r(sd_2) estimate the SDs that would give the same SE's if there were
> no clustering
> 
> sd1 = n1^.5 x se1
> sd2 = n2^.5 x se2
> 
> sd1 = (380)^.5 x .2763
> sd2 = (345(^.5 x .2768
> 
> r(sd_2) =  5.141886711364611
> r(sd_1) =  5.385836699859183
> 
> Steve
> 
> 
> On Jul 28, 2013, at 3:45 PM, Afia Tasneem wrote:
> 
> Dear Steve,
> 
> Thank you for your reply.  And apologies for not posting the code; I
> am new to statalist.
> 
> I would be grateful if you could also answer a few follow up questions:
> 
> As you can see from the code below, standard errors with and without
> clustering using svyset are almost the same (any reason for the super
> slight difference?): 3.168354 and 2.756693 with clustering and
> 3.170342 and 2.758793 for control and intervention groups respectively
> without clustering. However, the command clttest gives me different
> sds before and after clustering: with clttests, my sds are 5.385 and
> 5.141 for control and intervention groups respectively whereas in
> normal ttests, the sds are 3.170342 and  2.758793. Why do I get
> different sds with svyset plus estat and clttest?
> 
> below is the code:
> 
> . svyset branch
> 
>     pweight: <none>
>         VCE: linearized
> Single unit: missing
>    Strata 1: <one>
>        SU 1: branch
>       FPC 1: <zero>
> 
> . svy: mean class, over(intervention)
> (running mean on estimation sample)
> 
> Survey: Mean estimation
> 
> Number of strata =       1          Number of obs    =     725
> Number of PSUs   =      25          Population size  =     725
>                                   Design df        =      24
> 
>     control: intervention = control
> intervention: intervention = intervention
> 
> --------------------------------------------------------------
>            |             Linearized
>       Over |       Mean   Std. Err.     [95% Conf. Interval]
> -------------+------------------------------------------------
> class        |
>    control |   7.434211   .3031807      6.808476    8.059945
> intervention |   6.950725   .2003743      6.537172    7.364277
> --------------------------------------------------------------
> 
> . estat sd
> 
>     control: intervention = control
> intervention: intervention = intervention
> 
> -------------------------------------
>       Over |       Mean   Std. Dev.
> -------------+-----------------------
> class        |
>    control |   7.434211    3.168354
> intervention |   6.950725    2.756693
> -------------------------------------
> 
> . bysort intervention: sum class
> 
> -------------------------------------------------------------------------------------------------------------------------------------------
> -> intervention = control
> 
>   Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>      class |       380    7.434211    3.170342          0         12
> 
> -------------------------------------------------------------------------------------------------------------------------------------------
> -> intervention = intervention
> 
>   Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>      class |       345    6.950725    2.758793          0         12
> 
> However, when I use the command "clttest," my standard deviations do
> change with clustering:
> 
> with clttests, my sds are 5.385 and 5.141 for control and intervention
> groups respectively whereas in normal ttests, the sds are 3.170342 and
> 2.758793 for control and intervention groups respectively.
> 
> . clttest class, cluster(branch) by(intervention)
> 
> t-test adjusted for clustering
> class by intervention, clustered by branch
> ------------------------------------------------------------------------
> Intra-cluster correlation         =           0.0465
> ------------------------------------------------------------------------
>             N    Clusts    Mean           SE             95 % CI
> intervention=0   38011      7.4342      0.2763       [  6.8186,  8.0498]
> intervention=1   34514      6.9507      0.2768       [  6.3527,  7.5488]
> ------------------------------------------------------------------------
> Combined    725     14      7.2041      0.1957       [  6.7992,  7.6091]
> ------------------------------------------------------------------------
> Diff(0-1)   725     25      0.4835      0.3911       [ -0.3256,  1.2926]
> 
> Degrees freedom:    23
> 
>                   Ho: mean(-) = mean(diff) = 0
> 
> Ha: mean(diff) < 0         Ha: mean(diff) ~= 0        Ha: mean(diff) > 0
>      t =   1.2362                t =   1.2362              t =   1.2362
>  P < t =   0.8856          P > |t| =   0.2289          P > t =   0.1144
> 
> . return list
> 
> scalars:
>               r(N_2) =  345
>               r(N_1) =  380
>              r(df_t) =  23
>                 r(t) =  1.2362
>              r(sd_2) =  5.141886711364611
>              r(sd_1) =  5.385836699859183
>                r(se) =  .3911133002996737
>            r(m_diff) =  .4834856986999512
>              r(se_2) =  .2768298747832084
>              r(se_1) =  .2762875930960634
>              r(mu_2) =  6.950724601745606
>              r(mu_1) =  7.434210300445557
>               r(p_l) =  .8855657157257124
>               r(p_u) =  .1144342842742876
>                 r(p) =  .2288685685485752
> 
> . ttest class, by(intervention)
> 
> Two-sample t test with equal variances
> ------------------------------------------------------------------------------
>  Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
> ---------+--------------------------------------------------------------------
> control |     380    7.434211    .1626351    3.170342     7.11443    7.753991
> interven |     345    6.950725    .1485284    2.758793    6.658586    7.242863
> ---------+--------------------------------------------------------------------
> combined |     725    7.204138    .1110214    2.989343    6.986176      7.4221
> ---------+--------------------------------------------------------------------
>   diff |            .4834859    .2217278                .0481787    .9187931
> ------------------------------------------------------------------------------
>   diff = mean(control) - mean(interven)                         t =   2.1805
> Ho: diff = 0                                     degrees of freedom =      723
> 
>   Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
> Pr(T < t) = 0.9852         Pr(|T| > |t|) = 0.0295          Pr(T > t) = 0.0148
> 
> Very grateful for your help.
> 
> Best regards,
> Afia
> 
> 
> 
> 
> On Fri, Jul 26, 2013 at 5:25 PM, Steve Samuels <[email protected]> wrote:
>> 
>> The Statalist FAQ request that you show both your code and results. As
>> you didn't, we have little idea of what you saw. I guess that your
>> -svyset- didn't specify a probability weight.
>> 
>> In that case, observations are equally weighted, and the estimated
>> population standard deviation *and* mean must be identical to the sample
>> versions, as given by -summarize-. Clustering, as you, noticed affects
>> only standard errors. The following shows that the sd and mean are
>> affected only by weighting  and not by clustering.
>> 
>> 
>> . sysuse auto, clear
>> . gen mkr = substr(make,1,2)
>> 
>> . svyset mkr
>> . svy: mean turn
>> . estat sd
>> . sum turn
>> 
>> . svyset mkr [pw = price]
>> . svy: mean turn
>> . estat sd
>> . sum turn [aw = price]
>> 
>> Steve
>> 
>> On Jul 26, 2013, at 12:25 PM, Afia Tasneem wrote:
>> 
>> Dear all,
>> 
>> I am working on the analysis of a clustered randomized trial.
>> 
>> My standard errors change when I svyset the data to account for
>> clustering. However, the standard deviations after clustering with
>> svyset and using estat sd is the same as before clustering (also the
>> same as simply using: sum var). Should the sd remain unaffected with
>> changes in se due to clustering? Or is the command "estat sd" not the
>> right one to use to find standard deviations after clustering?
>> 
>> Thanks much,
>> Afia
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index