Re: st: Predicted probabilities in graph

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: Predicted probabilities in graph Date Thu, 4 Sep 2008 19:53:31 -0400

On Aug 28, 2008, at 8:07 AM, Richard Ohrvall wrote:

I am trying to illustrate predicted probabilities for a successful
outcome in a logistic regression, e.g. when one independent variable
goes from min to max when all other variables are kept at their mean.

This can be done by using the user-written program -prgen- by J. Scott
Long and J. Freese and the way that they explain in their book
Regression Models for Categorical Depedent Variables Using Stata
(2006), Stata Press. However, according to the book, the program can
not be used when data comes from a complex sample, i.e. when prefix
svy is used.

Since I mainly use survey data with complex sample design, this is a
problem for me if I want to present the confidence intervals in the
graph.

Here is a do file for Richard. It differs from the previous solution I submitted in two main ways:
1. It adds the confidence interval endpoints that Richard requested
2. It adjusts predictors to their survey-estimated means, not to their raw sample means.

The second feature is the primary contribution. Stata's -adjust- can do all that the previous version did, and more. I simply was not familiar with -adjust- and didn't recognize that it met most of Richard's needs--except the need to adjust to survey-estimated means. In fact, wiith only a few covariates, Richard could have added those means by hand.

-Steve

/**********************START CODE************************************/
/* ZAP TEST GREMLINS BEFORE RUNNING */
/*******************************E************************************/

local pname pprob01 //SET PROGRAM NAME FOR USE IN LOG & GRAPH
capture log close
set more off
log using `pname', replace text

/ ************************************************************************ ***
This do file creates predicted probabilities from survey logistic regression which vary in one predictor at a time, holding others at their estimated population means. Also computed are upper and lower confidence interval endpoints for the prediction. An optional graphics section plots predictions and CI endpoints for each variable against that variable.

The code takes advantage of Stata's -adjust- command to compute adjusted logit values and their standard errors.
************************************************************************ ****/

/* -svyset- the auto dat */
sysuse auto,clear
svyset _n [pweight=rep78]

/ ************************************************************************ ***
Define a predictor list for the right-hand-side of the svy: logit equation
************************************************************************ ****/
local rhs mpg weight trunk

/ ************************************************************************ ***
Compute estimated means of each predictor to use in the adjustment. These are the variable names prefixed by m_ ************************************************************************ ****/

svy: mean `rhs'
foreach v of varlist `rhs' {
local m_`v' = _b[`v']
}

/ ************************************************************************ ***
For each varying predictor, build up parts of the -adjust- command.
If mpg is the predictor, for example, the adjust command requires:

adjust weight = 2941.106382978724 trunk = 13.73191489361702
The constants are the means from -svy: mean- for variables OTHER than mpg.
************************************************************************ ****/

foreach v of varlist `rhs' {
local less : list rhs -v
local listx " "
foreach w of varlist `less' {
local lw "`w' = `m_`w'' " // e.g. "weight = 2941.106382978724"
local listx `listx' `lw' // Augment the list
}
local list_`v' `listx'
// di "`v'"
// di "`list_`v''"
}
/ ************************************************************************ ***
Now the survey logit command itself:
************************************************************************ ****/
svy: logit foreign `rhs'

/ ************************************************************************ ***
Create a constant to multiply the standard error of the prediction
************************************************************************ ****/

local mult = invttail(e(df_r), (1- .95)/2)

/ ************************************************************************ ***
For each predictor variable v (e.g. mpg), use -adjust- to compute:
xb_v : the logit of the predicted value
se_v : the se of the logit

pred_v : predicted value for each observation's value of v, holding other
predictors at their mean
llim_v : lower 95% confidence interval endpoint
ulim_v : upper 95% confidence interval endpoint
xb_v and se_v are not needed for the graphs and may be dropped.
************************************************************************ ****/

foreach v of varlist `rhs' {
local listl : list rhs - v
quietly adjust `listl' , g(xb_`v' se_`v') se
// di "`v'" " " "`listl'" " " xb_`v' " " se_`v'
gen pred_`v' = 1/(1+exp(-xb_`v'))
gen llim_`v' = xb_`v' - `mult'*se_`v'
gen ulim_`v' = xb_`v' + `mult'*se_`v'
replace llim_`v' = 1/(1+exp(-llim_`v'))
replace ulim_`v' = 1/(1+exp(-ulim_`v'))
label var pred_`v' "Predicted"
label var llim_`v' "Lower CI"
label var ulim_`v' "Upper CI"
}
sum xb_* se_*
drop xb_* se_* //optionally drop

/ ************************************************************************ ***
Save the data if you want to graph later.
************************************************************************ ****/

// save (choose a new name)

/ ************************************************************************ ***
Graph Module
Graph the prediction and confidence interval endpoints for each v against v, and save the graph.
************************************************************************ ****/

foreach v of varlist `rhs' {
twoway scatter pred_`v' llim_`v' ulim_`v' `v', msymbol(o o o) title ("Plot of Prediction against `:variable label `v''" "Other Predictors at their Mean") note("Program: `pname'. Data: ") saving(graph_`v', replace)
}

log close
set more on

/************************END CODE ***********************************/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/