Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: should estat sd reports same sd before and after clustering?


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: should estat sd reports same sd before and after clustering?
Date   Sun, 28 Jul 2013 17:05:18 -0400

Afia:

------------------------------------------------------------------------
 Intra-cluster correlation         =           0.0465
------------------------------------------------------------------------
             N    Clusts    Mean           SE             95 % CI
intervention=0   38011      7.4342      0.2763       [  6.8186,  8.0498]
intervention=1   34514      6.9507      0.2768       [  6.3527,  7.5488]
------------------------------------------------------------------------


r(sd_1) r(sd_2) estimate the SDs that would give the same SE's if there were
no clustering

sd1 = n1^.5 x se1  
sd2 = n2^.5 x se2

sd1 = (380)^.5 x .2763
sd2 = (345(^.5 x .2768

r(sd_2) =  5.141886711364611
r(sd_1) =  5.385836699859183

Steve


On Jul 28, 2013, at 3:45 PM, Afia Tasneem wrote:

Dear Steve,

Thank you for your reply.  And apologies for not posting the code; I
am new to statalist.

I would be grateful if you could also answer a few follow up questions:

As you can see from the code below, standard errors with and without
clustering using svyset are almost the same (any reason for the super
slight difference?): 3.168354 and 2.756693 with clustering and
3.170342 and 2.758793 for control and intervention groups respectively
without clustering. However, the command clttest gives me different
sds before and after clustering: with clttests, my sds are 5.385 and
5.141 for control and intervention groups respectively whereas in
normal ttests, the sds are 3.170342 and  2.758793. Why do I get
different sds with svyset plus estat and clttest?

below is the code:

. svyset branch

     pweight: <none>
         VCE: linearized
 Single unit: missing
    Strata 1: <one>
        SU 1: branch
       FPC 1: <zero>

. svy: mean class, over(intervention)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     725
Number of PSUs   =      25          Population size  =     725
                                   Design df        =      24

     control: intervention = control
intervention: intervention = intervention

--------------------------------------------------------------
            |             Linearized
       Over |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
class        |
    control |   7.434211   .3031807      6.808476    8.059945
intervention |   6.950725   .2003743      6.537172    7.364277
--------------------------------------------------------------

. estat sd

     control: intervention = control
intervention: intervention = intervention

-------------------------------------
       Over |       Mean   Std. Dev.
-------------+-----------------------
class        |
    control |   7.434211    3.168354
intervention |   6.950725    2.756693
-------------------------------------

. bysort intervention: sum class

-------------------------------------------------------------------------------------------------------------------------------------------
-> intervention = control

   Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      class |       380    7.434211    3.170342          0         12

-------------------------------------------------------------------------------------------------------------------------------------------
-> intervention = intervention

   Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      class |       345    6.950725    2.758793          0         12

However, when I use the command "clttest," my standard deviations do
change with clustering:

with clttests, my sds are 5.385 and 5.141 for control and intervention
groups respectively whereas in normal ttests, the sds are 3.170342 and
2.758793 for control and intervention groups respectively.

. clttest class, cluster(branch) by(intervention)

t-test adjusted for clustering
class by intervention, clustered by branch
------------------------------------------------------------------------
 Intra-cluster correlation         =           0.0465
------------------------------------------------------------------------
             N    Clusts    Mean           SE             95 % CI
intervention=0   38011      7.4342      0.2763       [  6.8186,  8.0498]
intervention=1   34514      6.9507      0.2768       [  6.3527,  7.5488]
------------------------------------------------------------------------
Combined    725     14      7.2041      0.1957       [  6.7992,  7.6091]
------------------------------------------------------------------------
Diff(0-1)   725     25      0.4835      0.3911       [ -0.3256,  1.2926]

Degrees freedom:    23

                   Ho: mean(-) = mean(diff) = 0

 Ha: mean(diff) < 0         Ha: mean(diff) ~= 0        Ha: mean(diff) > 0
      t =   1.2362                t =   1.2362              t =   1.2362
  P < t =   0.8856          P > |t| =   0.2289          P > t =   0.1144

. return list

scalars:
               r(N_2) =  345
               r(N_1) =  380
              r(df_t) =  23
                 r(t) =  1.2362
              r(sd_2) =  5.141886711364611
              r(sd_1) =  5.385836699859183
                r(se) =  .3911133002996737
            r(m_diff) =  .4834856986999512
              r(se_2) =  .2768298747832084
              r(se_1) =  .2762875930960634
              r(mu_2) =  6.950724601745606
              r(mu_1) =  7.434210300445557
               r(p_l) =  .8855657157257124
               r(p_u) =  .1144342842742876
                 r(p) =  .2288685685485752

. ttest class, by(intervention)

Two-sample t test with equal variances
------------------------------------------------------------------------------
  Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
control |     380    7.434211    .1626351    3.170342     7.11443    7.753991
interven |     345    6.950725    .1485284    2.758793    6.658586    7.242863
---------+--------------------------------------------------------------------
combined |     725    7.204138    .1110214    2.989343    6.986176      7.4221
---------+--------------------------------------------------------------------
   diff |            .4834859    .2217278                .0481787    .9187931
------------------------------------------------------------------------------
   diff = mean(control) - mean(interven)                         t =   2.1805
Ho: diff = 0                                     degrees of freedom =      723

   Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
Pr(T < t) = 0.9852         Pr(|T| > |t|) = 0.0295          Pr(T > t) = 0.0148

Very grateful for your help.

Best regards,
Afia




On Fri, Jul 26, 2013 at 5:25 PM, Steve Samuels <[email protected]> wrote:
> 
> The Statalist FAQ request that you show both your code and results. As
> you didn't, we have little idea of what you saw. I guess that your
> -svyset- didn't specify a probability weight.
> 
> In that case, observations are equally weighted, and the estimated
> population standard deviation *and* mean must be identical to the sample
> versions, as given by -summarize-. Clustering, as you, noticed affects
> only standard errors. The following shows that the sd and mean are
> affected only by weighting  and not by clustering.
> 
> 
> . sysuse auto, clear
> . gen mkr = substr(make,1,2)
> 
> . svyset mkr
> . svy: mean turn
> . estat sd
> . sum turn
> 
> . svyset mkr [pw = price]
> . svy: mean turn
> . estat sd
> . sum turn [aw = price]
> 
> Steve
> 
> On Jul 26, 2013, at 12:25 PM, Afia Tasneem wrote:
> 
> Dear all,
> 
> I am working on the analysis of a clustered randomized trial.
> 
> My standard errors change when I svyset the data to account for
> clustering. However, the standard deviations after clustering with
> svyset and using estat sd is the same as before clustering (also the
> same as simply using: sum var). Should the sd remain unaffected with
> changes in se due to clustering? Or is the command "estat sd" not the
> right one to use to find standard deviations after clustering?
> 
> Thanks much,
> Afia
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index