Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Svy regress using subpop and incorrect number of obs


From   jpitblado@stata.com (Jeff Pitblado, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Svy regress using subpop and incorrect number of obs
Date   Tue, 22 Jul 2008 10:31:47 -0500

Heather Ridolfo <evd7@CDC.GOV> is using -svy: regress- with the -subpop()-
option, and noticed that the reported sample size is smaller than the number
of observations in here dataset:

> I am using survey regression with the subpop command, see below. 
> 
> svyset psuscid [pweight = gswgt1], strata(region)
> svy, subpop(allsp): regress esteem white3 hisp child_sex
> 
> However the number of observations in the output does not match the
> total number of cases in my dataset. I have 18924 cases in the original
> dataset, here the number of observations is only 18768.
> However, the subpopulation number of observations does appear correct
> (10224). 
> 
> Survey: Linear regression
> 
> Number of strata   =         4              Number of obs      =     18768
> Number of PSUs     =       132              Population size    =  22000302
>                                             Subpop. no. of obs =     10224
>                                             Subpop. size       =  12582072
>                                             Design df          =       128
>                                             F(   3,    126)    =     47.41
>                                             Prob > F           =    0.0000
>                                             R-squared          =    0.0331
> 
> 
> In another output not only are the number of observations incorrect
> (should be 10244) but the PSUs are also lower.
> 
> svyset psuscid [pweight = gswgt1], strata(region)
> svy, subpop(if bhsp == 1): regress  esteem racebh
> 
> 
> Number of strata   =         4              Number of obs      =      6973
> Number of PSUs     =       126              Population size    = 5955176.3
>                                             Subpop. no. of obs =      4017
>                                             Subpop. size       = 3346350.8
>                                             Design df          =       122
>                                             F(   1,    122)    =     33.36
>                                             Prob > F           =    0.0000
>                                             R-squared          =    0.0253
> 
> There are missing cases in some of the variables in my regression. Is
> stata dropping these cases from the number of original observations? I
> do specify in my subpop command to not include cases with missing data.
> If STATA is dropping observations from my original dataset due to
> incomplete data, is the survey design information from these
> observations retained in the calculation of the standard errors? 
> Every example I have found of stata output using survey regression with
> the subpop command the number of observations matches the total number
> of cases in the dataset. 

Heather should check that her Stata is fully up-to-date.  On 02apr2008, we
posted an ado-file update that fixed a problem similar to what Heather is
describing above.  Here is the corresponding entry from -help whatsnew-:

5.  svy's linearized variance estimator was marking out observations that
    had missing values in the independent variables for observations outside
    the subpopulation.  This affects the estimated variance values when the
    primary sampling units were the individual observations and could decrease
    the design degrees of freedom.  Both of these effects are very slight and
    inversely related to the sample size.  This has been fixed.

Note that, prior to this update, entire PSU's can be dropped if each
observation within the PSU contains a missing value in one of the variables in
the model fit.  With an updated Stata, only observations containing missing
values within the subpop are dropped.

Here is a simple experiment, using the auto data:

. sysuse auto
. gen sub = for & !missing(rep78)
. tab rep78 sub

    Repair |
    Record |          sub
      1978 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
         2 |         8          0 |         8 
         3 |        27          3 |        30 
         4 |         9          9 |        18 
         5 |         2          9 |        11 
         . |         5          0 |         5 
-----------+----------------------+----------
     Total |        53         21 |        74 


. svyset _n
. svy, subpop(sub): regress mpg rep78

(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         1                  Number of obs      =        74
Number of PSUs     =        74                  Population size    =        74
                                                Subpop. no. of obs =        21
                                                Subpop. size       =        21
                                                Design df          =        73
                                                F(   1,     73)    =      0.60
                                                Prob > F           =    0.4409
                                                R-squared          =    0.0285

------------------------------------------------------------------------------
             |             Linearized
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |   1.486111   1.917962     0.77   0.441    -2.336381    5.308604
       _cons |   18.91667   7.106727     2.66   0.010      4.75298    33.08035
------------------------------------------------------------------------------


--Jeff
jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index