Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Weighted counts with "svy" command


From   Steven Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Weighted counts with "svy" command
Date   Fri, 16 Sep 2011 09:05:31 -0400

Shige-

My guess is that you are accustomed to surveys in which the sampling weights have been normalized to sum to sample size.  These are still issued with survey data sets such as the Demographic and Health Studies (where the given weights must first be divided by 1,000,000).  For estimating means, including proportions, and  regression coefficients, the normalization does not matter. However the weighted category counts in such data sets are meaningless.    

In NHIS, the sum of the original sampling weights do not sum exactly to the US category totals. Post-sample adjustments are applied so that weighted sample totals match US population totals for age, race/ethnicity, and sex. See: http://www.ihis.us/ihis/userNotes_weights.shtml.

Steve


On Sep 15, 2011, at 10:26 AM, Austin Nichols wrote:

Shige Song <shigesong@gmail.com>:
Weighted counts *should* sum to the population size.  Perhaps you want
to treat your pweights as aweights, in which case you would get the
sum as the number of obs?  But you claim that is not what you want,
since the obs option does not give the desired total.  Your
desideratum is very mysterious.  Maybe you want the product of
weighted proportions and the number of obs as returned by tabulate
treating pweights as aweights?  Why would you want that?

webuse nhanes2
svy:tab region, count se
mata: sum(st_matrix("e(b)"))
svy:tab region, obs
mata: sum(st_matrix("e(Obs)"))
tab region [aw=finalwgt]
*same as:
svy:tab region
mata: st_matrix("e(b)"):*sum(st_matrix("e(Obs)"))


On Thu, Sep 15, 2011 at 10:10 AM, Shige Song <shigesong@gmail.com> wrote:
> Hi Steve,
> 
> The weighted counts that we are getting with svy syntax are in the
> millions (222,760,817)--these are for the whole U.S. population.  We
> want weighted counts for our sample (approximately 300,000 cases).
> 
> Thanks.
> 
> Shige
> 
> On Wed, Sep 14, 2011 at 5:04 PM, Steven Samuels <sjsamuels@gmail.com> wrote:
>> 
>> What would weighted counts look like that are not the population counts? I can't think of any, so please supply an example.
>> 
>> 
>> Steve
>> 
>> On Sep 14, 2011, at 10:19 AM, Shige Song wrote:
>> 
>> Dear Colleagues,
>> 
>> We are trying to do an descriptive table of basic socio-demographic
>> and health characteristics of our 3 subpopulations of interest
>> (African born, Latin American born, and US born) using the National
>> Health Interview Survey (NHIS).  (We're using a pooled file,
>> 2005-2009.)  In previous research we would simply use tabulate and
>> show both the freq and % in our descriptive table.  Now we're using
>> the "svyset" command and then using "svy: tabulate nativity, count" to
>> get the weighted counts in the dataset.  However, this command gives
>> the weighted counts in, apparently, the total population, not in the
>> dataset.  Do you know how to obtain the weighted counts in the dataset
>> using "svy"?  I also tried "svy: tabulate nativity, obs", but that
>> gives us the unweighted number of observations.  Please see the output
>> below:
>> 
>> Below, for reference, are the unweighted tabulations of our nativity
>> groups in our 5-year pooled file.
>> . tab nativity, m
>>          Nativity |      Freq.     Percent        Cum.
>> --------------------+-----------------------------------
>>         U.S. born |    231,546       77.02       77.02
>> Latin American born |     43,246       14.39       91.41
>>      African Born |      1,857        0.62       92.02
>>             Other |     23,982        7.98      100.00
>> --------------------+-----------------------------------
>>             Total |    300,631      100.00
>> 
>> 
>> And here are the weighted counts when we use the "svy" syntax, but
>> they are apparently counts in the total population.  We are looking
>> for weighted frequencies in the dataset.
>> . svy: tabulate nativity, count format(%14.3gc)
>> (running tabulate on estimation sample)
>> 
>> Number of strata   =       639                 Number of obs      =     300631
>> Number of PSUs     =      1278                 Population size    =  222760817
>>                                              Design df          =        639
>> -----------------------
>> Nativity |       count
>> ----------+------------
>> U,S, bor | 185,258,131
>> Latin Am |  20,152,746
>>  African |   1,246,467
>>   Other |  16,103,473
>>         |
>>   Total | 222,760,817
>> -----------------------
>>  Key:  count     =  weighted counts
>> 
>> And if we just use "svy: tabulate nativity" (with no option
>> specified), we get only the cell proportions, although they are
>> properly weighted.
>> . svy: tabulate nativity
>> (running tabulate on estimation sample)
>> 
>> Number of strata   =       639                 Number of obs      =     300631
>> Number of PSUs     =      1278                 Population size    =  222760817
>>                                              Design df          =        639
>> 
>> -----------------------
>> Nativity | proportions
>> ----------+------------
>> U,S, bor |       .8316
>> Latin Am |       .0905
>>  African |       .0056
>>   Other |       .0723
>>         |
>>   Total |           1
>> -----------------------
>>  Key:  proportions  =  cell proportions
>> 
>> 
>> We tried using "svy: tabulation nativity, obs percent", see below, and
>> this gives us the weighted percents but the unweighted number of
>> observations in each category.  We have looked at Stata help for svy:
>> tabulate, but can't seem to figure this out.  We suspect it's simple.
>> Does anyone know how to get the weighted counts in the dataset with
>> svy: tabulate?
>> . svy: tabulate nativity, obs percent format(%14.3gc)
>> (running tabulate on estimation sample)
>> 
>> Number of strata   =       639                 Number of obs      =     300631
>> Number of PSUs     =      1278                 Population size    =  222760817
>>                                              Design df          =        639
>> 
>> ------------------------------------
>> Nativity | percentages          obs
>> ----------+-------------------------
>> U,S, bor |        83.2      231,546
>> Latin Am |        9.05       43,246
>>  African |         .56        1,857
>>   Other |        7.23       23,982
>>         |
>>   Total |         100      300,631
>> ------------------------------------
>>  Key:  percentages  =  cell percentages
>>       obs          =  number of observations
>> 
>> Thanks so much for taking the time to look at this.
>> 
>> Best,
>> Shige

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index