Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: should estat sd reports same sd before and after clustering?

From	Afia Tasneem <[email protected]>
To	[email protected]
Subject	Re: st: should estat sd reports same sd before and after clustering?
Date	Sun, 28 Jul 2013 18:03:48 -0400

Hi Steve,

I am confused. To be clear, sd's are not supposed to change with
clustering, correct? se's are supposed to change with clustering.

In a table reporting mean, sd of classes for males and females, the
difference between the two, se and p-value of the difference, where
the cluster design of the experiment is taken into account for all
numbers, what's the correct method to use (option 1 or 2 below):

Option 1:
Numbers using the following code:
svyset branch
svy: mean `var', over(intervention)
estat sd
lincom [`var']intervention - [`var']control

or
Option 2:
clttest `var', cluster(branch) by(intervention)

Many thanks,
Afia


On Sun, Jul 28, 2013 at 5:05 PM, Steve Samuels <[email protected]> wrote:
>
> Afia:
>
> ------------------------------------------------------------------------
>  Intra-cluster correlation         =           0.0465
> ------------------------------------------------------------------------
>              N    Clusts    Mean           SE             95 % CI
> intervention=0   38011      7.4342      0.2763       [  6.8186,  8.0498]
> intervention=1   34514      6.9507      0.2768       [  6.3527,  7.5488]
> ------------------------------------------------------------------------
>
>
> r(sd_1) r(sd_2) estimate the SDs that would give the same SE's if there were
> no clustering
>
> sd1 = n1^.5 x se1
> sd2 = n2^.5 x se2
>
> sd1 = (380)^.5 x .2763
> sd2 = (345(^.5 x .2768
>
> r(sd_2) =  5.141886711364611
> r(sd_1) =  5.385836699859183
>
> Steve
>
>
> On Jul 28, 2013, at 3:45 PM, Afia Tasneem wrote:
>
> Dear Steve,
>
> Thank you for your reply.  And apologies for not posting the code; I
> am new to statalist.
>
> I would be grateful if you could also answer a few follow up questions:
>
> As you can see from the code below, standard errors with and without
> clustering using svyset are almost the same (any reason for the super
> slight difference?): 3.168354 and 2.756693 with clustering and
> 3.170342 and 2.758793 for control and intervention groups respectively
> without clustering. However, the command clttest gives me different
> sds before and after clustering: with clttests, my sds are 5.385 and
> 5.141 for control and intervention groups respectively whereas in
> normal ttests, the sds are 3.170342 and  2.758793. Why do I get
> different sds with svyset plus estat and clttest?
>
> below is the code:
>
> . svyset branch
>
>      pweight: <none>
>          VCE: linearized
>  Single unit: missing
>     Strata 1: <one>
>         SU 1: branch
>        FPC 1: <zero>
>
> . svy: mean class, over(intervention)
> (running mean on estimation sample)
>
> Survey: Mean estimation
>
> Number of strata =       1          Number of obs    =     725
> Number of PSUs   =      25          Population size  =     725
>                                    Design df        =      24
>
>      control: intervention = control
> intervention: intervention = intervention
>
> --------------------------------------------------------------
>             |             Linearized
>        Over |       Mean   Std. Err.     [95% Conf. Interval]
> -------------+------------------------------------------------
> class        |
>     control |   7.434211   .3031807      6.808476    8.059945
> intervention |   6.950725   .2003743      6.537172    7.364277
> --------------------------------------------------------------
>
> . estat sd
>
>      control: intervention = control
> intervention: intervention = intervention
>
> -------------------------------------
>        Over |       Mean   Std. Dev.
> -------------+-----------------------
> class        |
>     control |   7.434211    3.168354
> intervention |   6.950725    2.756693
> -------------------------------------
>
> . bysort intervention: sum class
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> -> intervention = control
>
>    Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>       class |       380    7.434211    3.170342          0         12
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> -> intervention = intervention
>
>    Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>       class |       345    6.950725    2.758793          0         12
>
> However, when I use the command "clttest," my standard deviations do
> change with clustering:
>
> with clttests, my sds are 5.385 and 5.141 for control and intervention
> groups respectively whereas in normal ttests, the sds are 3.170342 and
> 2.758793 for control and intervention groups respectively.
>
> . clttest class, cluster(branch) by(intervention)
>
> t-test adjusted for clustering
> class by intervention, clustered by branch
> ------------------------------------------------------------------------
>  Intra-cluster correlation         =           0.0465
> ------------------------------------------------------------------------
>              N    Clusts    Mean           SE             95 % CI
> intervention=0   38011      7.4342      0.2763       [  6.8186,  8.0498]
> intervention=1   34514      6.9507      0.2768       [  6.3527,  7.5488]
> ------------------------------------------------------------------------
> Combined    725     14      7.2041      0.1957       [  6.7992,  7.6091]
> ------------------------------------------------------------------------
> Diff(0-1)   725     25      0.4835      0.3911       [ -0.3256,  1.2926]
>
> Degrees freedom:    23
>
>                    Ho: mean(-) = mean(diff) = 0
>
>  Ha: mean(diff) < 0         Ha: mean(diff) ~= 0        Ha: mean(diff) > 0
>       t =   1.2362                t =   1.2362              t =   1.2362
>   P < t =   0.8856          P > |t| =   0.2289          P > t =   0.1144
>
> . return list
>
> scalars:
>                r(N_2) =  345
>                r(N_1) =  380
>               r(df_t) =  23
>                  r(t) =  1.2362
>               r(sd_2) =  5.141886711364611
>               r(sd_1) =  5.385836699859183
>                 r(se) =  .3911133002996737
>             r(m_diff) =  .4834856986999512
>               r(se_2) =  .2768298747832084
>               r(se_1) =  .2762875930960634
>               r(mu_2) =  6.950724601745606
>               r(mu_1) =  7.434210300445557
>                r(p_l) =  .8855657157257124
>                r(p_u) =  .1144342842742876
>                  r(p) =  .2288685685485752
>
> . ttest class, by(intervention)
>
> Two-sample t test with equal variances
> ------------------------------------------------------------------------------
>   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
> ---------+--------------------------------------------------------------------
> control |     380    7.434211    .1626351    3.170342     7.11443    7.753991
> interven |     345    6.950725    .1485284    2.758793    6.658586    7.242863
> ---------+--------------------------------------------------------------------
> combined |     725    7.204138    .1110214    2.989343    6.986176      7.4221
> ---------+--------------------------------------------------------------------
>    diff |            .4834859    .2217278                .0481787    .9187931
> ------------------------------------------------------------------------------
>    diff = mean(control) - mean(interven)                         t =   2.1805
> Ho: diff = 0                                     degrees of freedom =      723
>
>    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
> Pr(T < t) = 0.9852         Pr(|T| > |t|) = 0.0295          Pr(T > t) = 0.0148
>
> Very grateful for your help.
>
> Best regards,
> Afia
>
>
>
>
> On Fri, Jul 26, 2013 at 5:25 PM, Steve Samuels <[email protected]> wrote:
>>
>> The Statalist FAQ request that you show both your code and results. As
>> you didn't, we have little idea of what you saw. I guess that your
>> -svyset- didn't specify a probability weight.
>>
>> In that case, observations are equally weighted, and the estimated
>> population standard deviation *and* mean must be identical to the sample
>> versions, as given by -summarize-. Clustering, as you, noticed affects
>> only standard errors. The following shows that the sd and mean are
>> affected only by weighting  and not by clustering.
>>
>>
>> . sysuse auto, clear
>> . gen mkr = substr(make,1,2)
>>
>> . svyset mkr
>> . svy: mean turn
>> . estat sd
>> . sum turn
>>
>> . svyset mkr [pw = price]
>> . svy: mean turn
>> . estat sd
>> . sum turn [aw = price]
>>
>> Steve
>>
>> On Jul 26, 2013, at 12:25 PM, Afia Tasneem wrote:
>>
>> Dear all,
>>
>> I am working on the analysis of a clustered randomized trial.
>>
>> My standard errors change when I svyset the data to account for
>> clustering. However, the standard deviations after clustering with
>> svyset and using estat sd is the same as before clustering (also the
>> same as simply using: sum var). Should the sd remain unaffected with
>> changes in se due to clustering? Or is the command "estat sd" not the
>> right one to use to find standard deviations after clustering?
>>
>> Thanks much,
>> Afia
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Steve Samuels <[email protected]>

References:
- st: should estat sd reports same sd before and after clustering?
  - From: Afia Tasneem <[email protected]>
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Steve Samuels <[email protected]>
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Afia Tasneem <[email protected]>
- Re: st: should estat sd reports same sd before and after clustering?
  - From: Steve Samuels <[email protected]>

Prev by Date: st: ivreg2 updated
Next by Date: st: SQL in Stata?
Previous by thread: Re: st: should estat sd reports same sd before and after clustering?
Next by thread: Re: st: should estat sd reports same sd before and after clustering?
Index(es):
- Date
- Thread