Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: predicted probabilities


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: predicted probabilities
Date   Sun, 16 Nov 2008 09:10:40 +0000 (GMT)

--- Mona Mowafi <mmowafi@hsph.harvard.edu> wrote:
> I am seeking to attain predicted probabilities of my outcome (BMI
> cats - normal, overweight, obese) for four main independent
> variables.  I am not sure how to do it, but here is what I have
> tried:
> 
> svyset [pweight=femaleweight], strata(order) psu(place)
> 
> xi: svymlogit BMICAT i.AGECAT4 i.ED2 i.WB_pov i.ASSET_INDEX
> i.PCAwealthindex i.FATHERED i.GENHEALTH_PAST, basecategory(2) nolog
> svymlogit, rrr
> 
> predict p1 p2 p3
> sort ED2
> by ED2: sum p1
> by ED2: sum p2
> by ED2: sum p3
> 
> Here are my main questions:
> 
> 1) Does this syntax, does p1 refer to my reference outcome = normal
> weight; p2= overweight, p3 = obese?  I want to make sure that I am
> interpreting what p1, p2, and p3 is properly.

You can see what category the variables refer to by looking at the
labels that -predict- has attached to them. You can see those by typing
-desc p*- (which will describe all variables whose name start with p,
if there are too many of those type -desc p1 p2 p3-).
 
> 2) If I sort and sum by p1, p2, and p3 - is this giving me the mean
> predicted probability of each of my three outcomes for all
> individuals in each of those three sub-categories (of education, for
> example, as seen above)?  That is what I'm trying to do.

Yes, but there is a subtle issue here: the differences between the
educational categories may be due to the effect of education but can
also be due to differences between the educational categories in the
distribution of the other explanatory variables. For instance the lower
educational categories will consist of individuals from a lower social
background and these tend to have , and these tend a higher BMI. You
can keep the other variables constant by first replacing the other
variables by some number, e.g. the mean, and than predict, and than
make the tables. 

Both methods are illustrated below (I used -table- in this examples as
it creates more compact tables, but -by ...: sum...- will work too,
another alternative would be -tabstat-).

*---------------------- begin example ---------------------
webuse nhanes2f, clear
svyset psuid [pweight=finalwgt], strata(stratid)
tab health
svy: mlogit health rural black orace sex age

// create predictions without keeping other variables constant
predict pr*, pr

// the labels show which variable belongs to which category
desc pr*

// comparing the average predicted probabilities with the observed
percentages
sum pr* 
tab health

table race , c(m pr1 m pr2 m pr3 m pr4 m pr5)


// creating predictions while keeping other variables constant
// predicted probabilities of urban women of average age
preserve
sum age if e(sample), meanonly
replace age = r(mean)
replace female = 1
replace rural = 0

predict pra*, pr
table race , c(m pra1 m pra2 m pra3 m pra4 m pra5)

restore 
*--------------------- end example -------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

Hope this helps,
Maarten

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index