Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Number of Obs with svy , suppop()


From   Michael Mitchell <Michael.Norman.Mitchell@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Number of Obs with svy , suppop()
Date   Thu, 18 Mar 2010 16:20:59 -0700

Greetings

  I am flummoxed by the output of "svy : tab" with respect to the
population size. I hope someone can help. For example, consider the
"highschool" dataset used in the [SVY] manual, with a couple of tweaks
as shown below...

. webuse highschool, clear
. svyset [pw=sampwgt]
. replace race = . in 1/71

Here is the tabulation of race and sex by race.

. tab  race, missing

   1=white, |
   2=black, |
    3=other |      Freq.     Percent        Cum.
------------+-----------------------------------
      White |      3,500       85.97       85.97
      Black |        431       10.59       96.56
      Other |         69        1.69       98.26
          . |         71        1.74      100.00
------------+-----------------------------------
      Total |      4,071      100.00

. tab sex race, missing

   1=male, |          1=white, 2=black, 3=other
  2=female |     White      Black      Other          . |     Total
-----------+--------------------------------------------+----------
      male |     1,676        193         35         34 |     1,938
    female |     1,824        238         34         37 |     2,133
-----------+--------------------------------------------+----------
     Total |     3,500        431         69         71 |     4,071

  Now I run a "svy : tab" on race, and the "Number of obs" is 4000, as
I expect since that is the number of valid observations on race.

. svy : tab race, count format(%13.2fc)
(running tabulate on estimation sample)

Number of strata   =         1                  Number of obs      =      4000
Number of PSUs     =      4000                  Population size    = 7880496.9
                                                Design df          =      3999

------------------------
1=white,  |
2=black,  |
3=other   |        count
----------+-------------
    White | 6,930,316.91
    Black |   754,879.69
    Other |   195,300.31
          |
    Total | 7,880,496.91
------------------------
  Key:  count     =  weighted counts

.
  But now I want to analyze just the sub-population of males (sex==1)
and it shows that the number of obs is now 4037 (see below). How can
the number of observations increase when adding a -subpop()- option?
There are suddenly 37 extra observations. Note this corresponds to the
number of females with a missing race.

. svy , subpop(if sex==1): tab race, count format(%13.2fc)
(running tabulate on estimation sample)

Number of strata   =         1                  Number of obs      =      4037
Number of PSUs     =      4037                  Population size    = 7932333.9
                                                Subpop. no. of obs =      1904
                                                Subpop. size       = 3780355.3
                                                Design df          =      4036

------------------------
1=white,  |
2=black,  |
3=other   |        count
----------+-------------
    White | 3,367,920.96
    Black |   324,487.42
    Other |    87,946.89
          |
    Total | 3,780,355.27
------------------------
  Key:  count     =  weighted counts

  Just to make sure that this was not coincidence, I repeated this
process again with a different number of missing values on race. The
output below shows, again, when adding the -subpop() option, the
number of observations increases, again by the number of women who
have a missing value on race (from 4061 to 4065, and 4 women have a
missing value on race).

. webuse highschool, clear

. svyset [pw=sampwgt]

      pweight: sampwgt
          VCE: linearized
  Single unit: missing
     Strata 1: <one>
         SU 1: <observations>
        FPC 1: <zero>

.
. replace race = . in 1/10
(10 real changes made, 10 to missing)

. tab  race, missing

   1=white, |
   2=black, |
    3=other |      Freq.     Percent        Cum.
------------+-----------------------------------
      White |      3,542       87.01       87.01
      Black |        450       11.05       98.06
      Other |         69        1.69       99.75
          . |         10        0.25      100.00
------------+-----------------------------------
      Total |      4,071      100.00

. tab sex race, missing

   1=male, |          1=white, 2=black, 3=other
  2=female |     White      Black      Other          . |     Total
-----------+--------------------------------------------+----------
      male |     1,696        201         35          6 |     1,938
    female |     1,846        249         34          4 |     2,133
-----------+--------------------------------------------+----------
     Total |     3,542        450         69         10 |     4,071

. svy : tab race, count format(%13.2fc)
(running tabulate on estimation sample)

Number of strata   =         1                  Number of obs      =      4061
Number of PSUs     =      4061                  Population size    = 7972647.7
                                                Design df          =      4060

------------------------
1=white,  |
2=black,  |
3=other   |        count
----------+-------------
    White | 7,000,891.28
    Black |   776,456.11
    Other |   195,300.31
          |
    Total | 7,972,647.70
------------------------
  Key:  count     =  weighted counts

. svy , subpop(if sex==1): tab race, count format(%13.2fc)
(running tabulate on estimation sample)

Number of strata   =         1                  Number of obs      =      4065
Number of PSUs     =      4065                  Population size    = 7979171.9
                                                Subpop. no. of obs =      1932
                                                Subpop. size       = 3827193.3
                                                Design df          =      4064

------------------------
1=white,  |
2=black,  |
3=other   |        count
----------+-------------
    White | 3,404,730.57
    Black |   334,515.81
    Other |    87,946.89
          |
    Total | 3,827,193.27
------------------------
  Key:  count     =  weighted counts

  Can someone explain why the number of observations increases based
on the number of people who are excluded based on the -subpop()-
option who are also missing on the tabulated variable?

Many thanks,

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index