Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re:


From   sjsamuels@gmail.com
To   statalist@hsphsun2.harvard.edu
Subject   st: Re:
Date   Thu, 2 Jul 2009 20:28:25 -0400

Arnold,

Arnold,

I cannot tell why the SE's are so different. The n's and outcome means
for the subpopulation size total for the "smkskul" are identical in
all three analyses, so that is not a problem.  I do see some issues.

1.  In SAS. the variables in the CLUSTER statement should identify
only the PSUs, the 1st stage units.   This should, however, lead to
smaller, rather than larger standard errors.

2. Stata thinks that there are 46 strata in the entire sample, but SAS
thinks that there are 27.  SUDAAN and SAS differ by about 1,000 in
their report of the sample size for the original population.

3. The subpopulation seems confined to one PSU- one value of "skulid"
- and one stratum,, but Stata says that there arere nine PSUs with
observations in the subpopulation.  Perhaps Stata considers the
second stage units,, class rooms, as PSU's in this case, and the
othefrs do not. If so, this could account for some of the discrepancy:
between-classroom variation could be  be  small, if there are 16
individuals in nine classrooms.

4. The outcome, according to SUDAAN, is missing for 93% of the
subpopulation sample.

I suggest that you make sure that variables and observations are
identical in the data sets (I notice two different weight variables);
make sure that the cluster, classroom, and stratum counts agree in SAS
and Stata.  Rerun your analyses on this outcome and on one with no
missing values and submit your findings to the group with a copy to
Jeff Pitblado at Stata.

Good luck!

Steve

On Tue, Jun 30, 2009 at 3:50 PM, Levinson,
Arnold<Arnold.Levinson@ucdenver.edu> wrote:
> Steve,
> Sorry for overlooking the obvious. Here are the commands and output. (I note as usual the wonderful output efficiency of Stata over the others.)
> arnold
> _____________________
> *Stata*
> svyset skulid [pw=w2f2f3], strata(strat) fpc(fpc) || classid
>
>      pweight: w2f2f3
>          VCE: linearized
>     Strata 1: strat
>         SU 1: skulid
>        FPC 1: fpc
>     Strata 2: <one>
>         SU 2: classid
>        FPC 2: <zero>
>
> . svy, subpop(if year==2008 & skulid==80001): mean smkskul
> (running mean on estimation sample)
>
> Survey: Mean estimation
>
> Number of strata =       1          Number of obs    =     131
> Number of PSUs   =       9          Population size  = 783.698
>                                    Subpop. no. obs  =      16
>                                    Subpop. size     = 120.542
>                                    Design df        =       8
>
> --------------------------------------------------------------
>             |             Linearized
>             |       Mean   Std. Err.     [95% Conf. Interval]
> -------------+------------------------------------------------
>     smkskul |   .5806258    .014649      .5468452    .6144064
> --------------------------------------------------------------
> Note: 45 strata omitted because they contain no subpopulation members
>
> ___________________
> SAS:
> PROC SURVEYMEANS DATA = ytabstest RATE = FPC;
>        VAR SMKSKUL;
>        STRATA STRAT;
>        CLUSTER SKULID CLASSID;
>        WEIGHT SKULWT;
>        DOMAIN skulstrat;
> RUN;
>
>                                         The SAS System         08:13 Tuesday, June 30, 2009 315
>
>                                   The SURVEYMEANS Procedure
>
>                                          Data Summary
>                              Number of Strata                  27
>                              Number of Clusters              1282
>                              Number of Observations         21212
>                              Sum of Weights                 98864
>
>                                           Statistics
>                                          Std Error
> Variable    Label     N       Mean         of Mean       95% CL for Mean
> ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ  SMKSKUL     SMKSKUL   1706   0.488438      0.015833    0.45735470 0.51952078
> ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ
>
>                                   Domain Analysis: skulstrat
>
>                                           Std Error
> skulstrat Variable Label   N     Mean       of Mean      95% CL for Mean
>  ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ         0  SMKSKUL   SMKSKUL    1690    0.487015    0.016001  0.45560287 0.51842627
> 1  SMKSKUL   SMKSKUL      16    0.580626    0.104178  0.37624423 0.78500743
>  ャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャャ
>
>                                         The SAS System         08:13 Tuesday, June 30, 2009 316
>
>
> PROC DESCRIPT DATA = ytabstest DESIGN = WOR;
>        NEST STRAT SKULID CLASSID / MISSUNIT;
>        TOTCNT TOTSAMP _MINUS1_ _MINUS1_;
>        VAR SMKSKUL;
>        CLASS SMKSKUL;
>        WEIGHT SKULWT;
>        SUBPOPN skulstrat = 1;
> RUN;
>
>                                  S U D A A N
>            Software for the Statistical Analysis of Correlated Data
>           Copyright      Research Triangle Institute     August 2008
>                                 Release 10.0
>
>
> DESIGN SUMMARY: Variances will be computed using the Taylor Linearization Method, Assuming a
> Without Replacement (WOR) Design
>    Sample Weight: SKULWT
>    Stage 1 Stratification Variable: STRAT
>    Stage 1 Population Count Variable: TOTSAMP
>    Stage 2 NEST Variable: SKULID (stage type is data dependent)
>    Stage 2 Population Count Variable: _MINUS1_
>    Stage 3 With Replacement Sampling Variable: CLASSID
>    Stage 3 Population Count Variable: _MINUS1_
>
>
> Number of observations read    :  20434    Weighted count :    97843
> Observations in subpopulation  :    226    Weighted count :     1650
> Denominator degrees of freedom :    128
> Date: 06-30-2009                             SUDAAN                                  Page:  1
> Time: 13:38:12                                                                       Table: 1
>
> Frequencies and Values for CLASS Variables
> by: SMKSKUL.
>
> ----------------------------------
> SMKSKUL         Frequency    Value
> ----------------------------------
> Ordered
>  Position:
>  1                     6        0
> Ordered
>  Position:
>  2                    10        1
> ----------------------------------
>
>
> Date: 06-30-2009                             SUDAAN                                 Page:   2
> Time: 13:38:12                                                                      Table: 1
>
> Variance Estimation Method: Taylor Series (WOR)
> For Subpopulation: SKULSTRAT = 1
> by: Variable, SUDAAN Reserved Variable One.
>
> --------------------------------------------------------------------
> |                 |                  | SUDAAN Reserved Variable    |
> | Variable        |                  | One                         |
> |                 |                  |-----------------------------|
> |                 |                  | Total        | 1            |
> --------------------------------------------------------------------
> |                 |                  |              |              |
> | SMKSKUL         | Sample Size      |           16 |           16 |
> |                 | Weighted Size    |       120.54 |       120.54 |
> |                 | Total            |        69.99 |        69.99 |
> |                 | Lower 95% Limit  |              |              |
> |                 |  Total           |       -39.85 |       -39.85 |
> |                 | Upper 95% Limit  |              |              |
> |                 |  Total           |       179.83 |       179.83 |
> |                 | Mean             |      0.58063 |      0.58063 |
> |                 | SE Mean          |         0.09 |         0.09 |
> |                 | Lower 95% Limit  |              |              |
> |                 |  Mean            |      0.39690 |      0.39690 |
> |                 | Upper 95% Limit  |              |              |
> |                 |  Mean            |      0.76435 |      0.76435 |
> --------------------------------------------------------------------
>
>> On Tue, Jun 30, 2009 at 12:41 PM, Levinson,
>> Arnold<Arnold.Levinson@ucdenver.edu> wrote:
>>> Survey analysis experts:
>>> I have data from a stratified two-stage school survey. The first stage sampled schools within strata, the second sampled classrooms within selected schools.
>>>
>>> When estimating variables of interest at the school level, I get hugely different variance estimates running Stata vs. SAS or SUDAAN. Stata's estimates are generally a lot smaller than SAS's or SUDAAN's, and the latter to are similar or identical to each other.



-- 
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
845-246-0774

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index