[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: svy and pweight postestimation tools

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: svy and pweight postestimation tools
Date	Thu, 22 Jan 2009 07:57:04 -0500
---

Carissa, rather than hand-waving about an "obvious" fact, I thought I  
should write down the argument in detail. The main point is:  
Probability weights can be fractional whereas frequency weights are  
integers. However for a given set of probability weights, frequency  
weights can be constructed which are equivalent to the original  
probability weights, to any chosen degree of accuracy.


The first four facts below are the ones obvious"from inspection of  
the equations for estimating population parameters from survey data.   
See, for example, equations 3.6-2 and 3.6-3 in the Korn and Graubard  
book and equations 5.20 and 11.10  in Sharon Lohr, 1999,  Sampling:  
Design and Analysis.  Duxbury Press.


1. The estimating equations with probability weights are functions  
of  weighted means of individual observations. Each of these means is  
a ratio of weighted sums.
2. In each of these sums, an observation is multiplied  by its  
probability weight.

3.If the probability weights are integers, the equations are  
identical to those that would be set up for frequency-weighted  
observations with the same weights.

4. If probability weights are multiplied by the same constant,  
estimating equations and, hence, the estimators, do not change.  This  
is because the equations are based on means, and the constant will  
cancel out in numerator and denominator.

5. If the original probability weight is rounded to the nearest k-th  
decimal place (e.g. nearest 10-th) and then multiplied by  10^k, the  
result is an integer, which can be used as a frequency weight.  Thus  
it is possible to use frequency weights which are equivalent to the  
original probability weights to any degree of accuracy desired.   
(Thanks to Austin Nichols for pointing this out in a previous  
Statalist post.)

6. If k is sufficiently large, estimates based on these rounded,  
multiplied probability weights will be equal to the estimates based  
on the original weights, again to any desired degree of accuracy

Warning:
The computer algorithms based on frequency weights assume that the  
sample size is equal to the sum of the weights. If you use the  
converted probability weights, this sum will be 10^k times the  
population size. Therefore standard errors and confidence intervals  
based on these weights  will be invalid.  Also, too high a value of k  
might cause underflow problems if a program computes standard errors.

-Steve

On Jan 18, 2009, at 1:49 PM, Steven Samuels wrote:

>
> Carissa,
>
> I think that the legitimacy is "obvious" from inspection of the  
> formulas for weighted data.  Still, here's a demonstration that - 
> lroc- with frequency weights produces the same area under the ROC  
> curve as a properly probability weighted estimate. I computed  
> probability-weighted versions of the ROC and AUC with Roger  
> Newson's programs -somersd- and -senspec-, available at SSC. - 
> somersd- computes the AUC (he calls it the "c" statistic); and - 
> senspec- produces sensitivities and specificities for all cut  
> points. Both take pweights and -somersd- will take a cluster  
> variable, so that you can compute a proper CI for the area under  
> the curve. I had to add a zero-zero point to Roger's results before  
> plotting. If you want to completely satisfy your committee, just  
> use the probability-weighted versions.  Be sure to zap gremlins  
> before trying this code.
>
> -Steve
>
> **************************CODE BEGINS**************************
> sysuse auto,clear
> ****************************************************
> * Frequency weighted analysis
> ****************************************************
> logistic foreign mpg [fw=rep78]
> predict phat0
> lroc [fw=rep78]
>
> ****************************************************
> * Probability weights
> ****************************************************
> svyset _n [pweight=rep78]
> quietly svy: logistic foreign mpg
> predict phat
>
>
> somersd foreign phat [pweight=rep78], tr(c)
> matrix b = e(b)
> local auc = b[1,1]
> di   "Area under the Curve: " %6.5f `auc'
>
>
> ****************************************************
> *  Graph ROC Curve with probability weights
> ****************************************************
> senspec foreign phat [pweight=rep78], sensitivity(sens) specificity 
> (spec)
>
> ** Add zero-zero to graph
> tempfile t1
> save `t1'
> clear
> input spec sens
> 1 0
> end
> append using `t1'
> gen ispec=1-spec
>
> twoway (scatter sens  ispec , sort(sens ispec) connect(L) mlab 
> (mpg)) (line sens sens)
> ***************************CODE ENDS***************************
>
> On Jan 17, 2009, at 5:23 PM, Carissa Moffat Miller wrote:
>
>>
>> Steve,
>>
>> I was able to create the ROC curves using your advice about  
>> converting the pweights to fweights. However, now a dissertation  
>> committee member has asked me to justify (provide documentation)  
>> of the legitimacy of doing such a conversion. Is the conversion  
>> just to put the pweight in a format that will be accepted by the  
>> ROC command and artificially calling it an "fweight"?
>>
>> I was not able to find this specific issue addressed in the below  
>> reference and I have not been able to find another reference. Do  
>> you have any suggested citations?
>>
>> Carissa
>>
>>> From: [email protected]
>>> Subject: Re: st: svy and pweight postestimation tools
>>> Date: Sun, 23 Nov 2008 12:13:01 -0500
>>> To: [email protected]
>>>
>>> Carissa, consider ROC curves (the classification tables are not very
>>> useful in my experience). ROC curves show the trade-off between
>>> sensitivity and specificity. You would usually want population
>>> estimates of these probabilities, so ignoring the weights  
>>> wouldn't be
>>> wise.
>>>
>>> My previous post describes how you can compute residuals. These are
>>> inherently unweighted, because observations with the same covariate
>>> pattern will have the same predicted value, and so have only two
>>> values of residuals (for events and non-events). If you are
>>> comparing mean residuals, you might choose to weight them. See Korn
>>> & Graubard, Analysis of Health Surveys, Wiley, 1999, pp 105-115.
>>>
>>> -Steve
>>>
>>> On Nov 23, 2008, at 10:40 AM, Carissa Moffat Miller wrote:
>>>
>>>>
>>>>
>>>> Steve and Joao,
>>>>
>>>>
>>>>
>>>> Thank you for your suggestions and the information. I had
>>>> found the goodness of fit measure do file from your discussions
>>>> (svylogitgof)
>>>> and thought there might be something similar for the estat clas or
>>>> residuals for svy.
>>>>
>>>>
>>>>
>>>> All I was trying to say in my note is that the strata and
>>>> PSUs account for so little difference in the outcome that if it
>>>> were possible
>>>> to run residuals or classification tables using just pweights, I
>>>> wanted to keep
>>>> that option open. Such as:
>>>>
>>>>
>>>>
>>>> xi: logistic aepart i.agecat i.Incomequ i.HIGHEDUC female
>>>> [pweight=FAWT]
>>>>
>>>>
>>>>
>>>> But it appears that I will have the same issues. Thank you
>>>> so much for your responses and help.
>>>>
>>>>
>>>>
>>>> Carissa
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2008/11/22 Steven Samuels :
>>>>>> --
>>>>>>
>>>>>> Carissa:
>>>>>>
>>>>>> -help logistic postestimation- will show you which commands are
>>>>>> available
>>>>>> after -svy: logistic-. The -esttat clas- command is not one of
>>>>>> them in
>>>>>> Stata 9 or 10. -predict- with a -residuals- option is valid in
>>>>>> Stata 10.1
>>>>>> but not in Stata 9. You _can_ compute your own weighted survey -
>>>>>> linktest-
>>>>>> of fit.
>>>>>>
>>>>>> predict hat, xb
>>>>>> gen hat2 = hat*hat
>>>>>> svy: logistic aepart hat hat2 //link test is the significance
>>>>>> of phat2
>>>>>>
>>>>>> You can also construct ROC Curves. Use -logistic- with fweights,
>>>>>> the survey
>>>>>> weights rounded to the nearest integer. See the thread at:
>>>>>> http://www.stata.com/statalist/archive/2007-08/
>>>>>> msg00739.html#_jmp0_ .
>>>>>>
>>>>>> -Steve
>>>>>>
>>>>>>
>>>>>> On Nov 21, 2008, at 11:45 AM, Carissa Moffat Miller wrote:
>>>>>>
>>>>>>>
>>>>>>> StataList:
>>>>>>>
>>>>>>> I am conducting logistic regression for a complex survey design
>>>>>>> using
>>>>>>> Stata version 9. I have found in your past discussions and the
>>>>>>> user manuals
>>>>>>> that many postestimation tests are not appropriate with svy
>>>>>>> commands. I have
>>>>>>> not found discussion on classification tables and residuals and
>>>>>>> have been
>>>>>>> unable to get the following commands to work either with an svy
>>>>>>> command or
>>>>>>> by just using the pweights in Stata.
>>>>>>>
>>>>>>> I have been able to get these to work in another software
>>>>>>> program using
>>>>>>> the weights, but I'm concerned it isn't appropriately applied.
>>>>>>> Can someone
>>>>>>> tell me: 1) if these tests are appropriate with complex survey
>>>>>>> data or just
>>>>>>> pweights, and 2) if so,what are the commands or where would I
>>>>>>> find them? or
>>>>>>> 3) if not appropriate, a reference I might cite?
>>>>>>>
>>>>>>> (Note: The strata and PSUs, when analyzed separately, provide
>>>>>>> design
>>>>>>> effects almost equal to
>>>>>>> 1 so the effects in my model are almost entirely from the
>>>>>>> weighting. So, I
>>>>>>> could get results -except for standard errors - using just the
>>>>>>> weights.)
>>>>>>>
>>>>>>> Cheers, Carissa
>>>>>>>
>>>>>>>
>>>>>>> Syntax and error messages:
>>>>>>>
>>>>>>> svyset APSU [pweight=FAWT], strata (ASTRATUM)
>>>>>>> xi: svy: logistic aepart i.agecat i.Incomequ i.HIGHEDUC employed
>>>>>>> female
>>>>>>> urban
>>>>>>>
>>>>>>> estat clas
>>>>>>>
>>>>>>> {ERROR}: invalid subcommand clas
>>>>>>>
>>>>>>> predict r, residuals
>>>>>>> summarize r, detail
>>>>>>>
>>>>>>> {ERROR}: option residuals not allowed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *
>>>>>>> * For searches and help try:
>>>>>>> * http://www.stata.com/help.cgi?search
>>>>>>> * http://www.stata.com/support/statalist/faq
>>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/statalist/faq
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ----------------------------------------
>>>>> Joao Ricardo Lima, D.Sc.
>>>>> Professor
>>>>> UFPB-CCA-DCFS
>>>>> Fone: +553138923914
>>>>> Skype: joao_ricardo_lima
>>>>> ----------------------------------------
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/statalist/faq
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/statalist/faq
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>

Steven Samuels
845-246-0774
18 Cantine's Island
Saugerties, NY 12477
EFax: 208-498-7441





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: Re: st: RE: lorenz curve
Next by Date: st: Re:code for R-Squared and p
Previous by thread: Re: st: svy and pweight postestimation tools
Next by thread: st: Cameron and Trivedi's book from Amazon
Index(es):
- Date
- Thread