Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: svy subpop option and e(sample)

From	Richard Williams <[email protected]>
To	[email protected]
Subject	st: svy subpop option and e(sample)
Date	Tue, 24 May 2011 15:02:09 -0500

I've just noticed that the e(sample) option does not work the way Iexpect it to when using svy and the subpop option. Specifically,e(sample) codes everyone in the population as 1, whether they were inthe subpopulation specified or not. I guess I can sort of kind of seea rationale for doing this (the whole population is used to computethe standard errors) but it has the potential to screw up yourpost-estimation analysis if you only wanted to do things with (whatyou thought) was the subpopulation you expected.

The following illustrates this. There are only 1086 cases in thesubpopulation selected, but probabilities are computed for all 10,000cases. That is, coefficients computed using only the blacksubpopulation are used to compute probabilities for the entire population:


. webuse nhanes2f, clear

. svy, subpop(black): ologit health age female
(running ologit on estimation sample)

Survey: Ordered logistic regression

Number of strata   =        30                 Number of obs      =      10000
Number of PSUs     =        60                 Population size    =  113285074
                                               Subpop. no. of obs =       1086
                                               Subpop. size       =   11189236
                                               Design df          =         30
                                               F(   2,     29)    =      29.87
                                               Prob > F           =     0.0000

------------------------------------------------------------------------------
             |             Linearized
      health |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0452349   .0063482    -7.13   0.000    -.0581996   -.0322703
      female |  -.3975887   .1336441    -2.97   0.006    -.6705263   -.1246511
-------------+----------------------------------------------------------------
       /cut1 |  -4.427029   .2976634   -14.87   0.000    -5.034939   -3.819119
       /cut2 |   -2.97326   .2848889   -10.44   0.000    -3.555081   -2.391439
       /cut3 |  -1.347426   .2497407    -5.40   0.000    -1.857465   -.8373876
       /cut4 |   -.214417   .2857434    -0.75   0.459    -.7979829    .3691488
------------------------------------------------------------------------------
Note: 1 stratum omitted because it contains no subpopulation members.

. predict p1 p2 p3 p4 p5 if e(sample)
(option pr assumed; predicted probabilities)
(337 missing values generated)

. sum p1 p2 p3 p4 p5

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          p1 |     10000    .1362895    .0853605   .0286835   .3358028
          p2 |     10000    .2341256    .0876537   .0835067   .3482424
          p3 |     10000    .3367498    .0391877   .2327486   .3854614
          p4 |     10000    .1639109    .0712129   .0549047   .2749383
          p5 |     10000    .1289243    .0859123   .0284552   .3339704

I think one solution is to change the predict command to something like

predict p51 p52 p53 p54 p55 if e(sample) & black
predict p61 p62 p63 p64 p65 if e(sample) & `=e(subpop)'

But, are there others? Preferably simpler ones? And is this goodbehavior for e(sample) in the first place?


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>

Prev by Date: RE: st: egen anycount
Next by Date: RE: st: number of dates in x axis
Previous by thread: st: density plots
Next by thread: Re: st: svy subpop option and e(sample)
Index(es):
- Date
- Thread