Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svy and pweight postestimation tools


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: svy and pweight postestimation tools
Date   Thu, 22 Jan 2009 08:31:48 -0500

---

Carissa, rather than hand-waving about an "obvious" fact, I thought I should write down the argument in detail. The main point is: Probability weights can be fractional whereas frequency weights are integers. However for a given set of probability weights, frequency weights can be constructed which are equivalent to the original probability weights, to any chosen degree of accuracy.


The first four facts below are the ones obvious"from inspection of the equations for estimating population parameters from survey data. See, for example, equations 3.6-2 and 3.6-3 in the Korn and Graubard book and equations 5.20 and 11.10 in Sharon Lohr, 1999, Sampling: Design and Analysis. Duxbury Press.


1. The estimating equations with probability weights are functions of weighted means of individual observations. Each of these means is a ratio of weighted sums. 2. In each of these sums, an observation is multiplied by its probability weight.

3.If the probability weights are integers, the equations are identical to those that would be set up for frequency-weighted observations with the same weights.

4. If probability weights are multiplied by the same constant, estimating equations and, hence, the estimators, do not change. This is because the equations are based on means, and the constant will cancel out in numerator and denominator.

5. If the original probability weight is rounded to the nearest k-th decimal place (e.g. nearest 10-th) and then multiplied by 10^k, the result is an integer, which can be used as a frequency weight. Thus it is possible to use frequency weights which are equivalent to the original probability weights to any degree of accuracy desired. (Thanks to Austin Nichols for pointing this out in a previous Statalist post.)

6. If k is sufficiently large, estimates based on these rounded, multiplied probability weights will be equal to the estimates based on the original weights, again to any desired degree of accuracy

Warning:
The computer algorithms based on frequency weights assume that the sample size is equal to the sum of the weights. If you use the converted probability weights, this sum will be 10^k times the population size. Therefore standard errors and confidence intervals based on these weights will be invalid. Also, too high a value of k might cause underflow problems if a program computes standard errors.

-Steve

On Jan 18, 2009, at 1:49 PM, Steven Samuels wrote:


Carissa,

I think that the legitimacy is "obvious" from inspection of the formulas for weighted data. Still, here's a demonstration that - lroc- with frequency weights produces the same area under the ROC curve as a properly probability weighted estimate. I computed probability-weighted versions of the ROC and AUC with Roger Newson's programs -somersd- and -senspec-, available at SSC. - somersd- computes the AUC (he calls it the "c" statistic); and - senspec- produces sensitivities and specificities for all cut points. Both take pweights and -somersd- will take a cluster variable, so that you can compute a proper CI for the area under the curve. I had to add a zero-zero point to Roger's results before plotting. If you want to completely satisfy your committee, just use the probability-weighted versions. Be sure to zap gremlins before trying this code.

-Steve

**************************CODE BEGINS**************************
sysuse auto,clear
****************************************************
* Frequency weighted analysis
****************************************************
logistic foreign mpg [fw=rep78]
predict phat0
lroc [fw=rep78]

****************************************************
* Probability weights
****************************************************
svyset _n [pweight=rep78]
quietly svy: logistic foreign mpg
predict phat


somersd foreign phat [pweight=rep78], tr(c)
matrix b = e(b)
local auc = b[1,1]
di   "Area under the Curve: " %6.5f `auc'


****************************************************
*  Graph ROC Curve with probability weights
****************************************************
senspec foreign phat [pweight=rep78], sensitivity(sens) specificity (spec)

** Add zero-zero to graph
tempfile t1
save `t1'
clear
input spec sens
1 0
end
append using `t1'
gen ispec=1-spec

twoway (scatter sens ispec , sort(sens ispec) connect(L) mlab (mpg)) (line sens sens)
***************************CODE ENDS***************************

On Jan 17, 2009, at 5:23 PM, Carissa Moffat Miller wrote:


Steve,

I was able to create the ROC curves using your advice about converting the pweights to fweights. However, now a dissertation committee member has asked me to justify (provide documentation) of the legitimacy of doing such a conversion. Is the conversion just to put the pweight in a format that will be accepted by the ROC command and artificially calling it an "fweight"?

I was not able to find this specific issue addressed in the below reference and I have not been able to find another reference. Do you have any suggested citations?

Carissa

From: [email protected]
Subject: Re: st: svy and pweight postestimation tools
Date: Sun, 23 Nov 2008 12:13:01 -0500
To: [email protected]

Carissa, consider ROC curves (the classification tables are not very
useful in my experience). ROC curves show the trade-off between
sensitivity and specificity. You would usually want population
estimates of these probabilities, so ignoring the weights wouldn't be
wise.

My previous post describes how you can compute residuals. These are
inherently unweighted, because observations with the same covariate
pattern will have the same predicted value, and so have only two
values of residuals (for events and non-events). If you are
comparing mean residuals, you might choose to weight them. See Korn
& Graubard, Analysis of Health Surveys, Wiley, 1999, pp 105-115.

-Steve

On Nov 23, 2008, at 10:40 AM, Carissa Moffat Miller wrote:



Steve and Joao,



Thank you for your suggestions and the information. I had
found the goodness of fit measure do file from your discussions
(svylogitgof)
and thought there might be something similar for the estat clas or
residuals for svy.



All I was trying to say in my note is that the strata and
PSUs account for so little difference in the outcome that if it
were possible
to run residuals or classification tables using just pweights, I
wanted to keep
that option open. Such as:



xi: logistic aepart i.agecat i.Incomequ i.HIGHEDUC female
[pweight=FAWT]



But it appears that I will have the same issues. Thank you
so much for your responses and help.



Carissa






2008/11/22 Steven Samuels :
--

Carissa:

-help logistic postestimation- will show you which commands are
available
after -svy: logistic-. The -esttat clas- command is not one of
them in
Stata 9 or 10. -predict- with a -residuals- option is valid in
Stata 10.1
but not in Stata 9. You _can_ compute your own weighted survey -
linktest-
of fit.

predict hat, xb
gen hat2 = hat*hat
svy: logistic aepart hat hat2 //link test is the significance
of phat2

You can also construct ROC Curves. Use -logistic- with fweights,
the survey
weights rounded to the nearest integer. See the thread at:
http://www.stata.com/statalist/archive/2007-08/
msg00739.html#_jmp0_ .

-Steve


On Nov 21, 2008, at 11:45 AM, Carissa Moffat Miller wrote:


StataList:

I am conducting logistic regression for a complex survey design
using
Stata version 9. I have found in your past discussions and the
user manuals
that many postestimation tests are not appropriate with svy
commands. I have
not found discussion on classification tables and residuals and
have been
unable to get the following commands to work either with an svy
command or
by just using the pweights in Stata.

I have been able to get these to work in another software
program using
the weights, but I'm concerned it isn't appropriately applied.
Can someone
tell me: 1) if these tests are appropriate with complex survey
data or just
pweights, and 2) if so,what are the commands or where would I
find them? or
3) if not appropriate, a reference I might cite?

(Note: The strata and PSUs, when analyzed separately, provide
design
effects almost equal to
1 so the effects in my model are almost entirely from the
weighting. So, I
could get results -except for standard errors - using just the
weights.)

Cheers, Carissa


Syntax and error messages:

svyset APSU [pweight=FAWT], strata (ASTRATUM)
xi: svy: logistic aepart i.agecat i.Incomequ i.HIGHEDUC employed
female
urban

estat clas

{ERROR}: invalid subcommand clas

predict r, residuals
summarize r, detail

{ERROR}: option residuals not allowed



*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




--
----------------------------------------
Joao Ricardo Lima, D.Sc.
Professor
UFPB-CCA-DCFS
Fone: +553138923914
Skype: joao_ricardo_lima
----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


Steven Samuels
845-246-0774
18 Cantine's Island
Saugerties, NY 12477
EFax: 208-498-7441




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index