# Re: st: predicted probabilities

 From "Joao Ricardo F. Lima" To statalist@hsphsun2.harvard.edu Subject Re: st: predicted probabilities Date Mon, 17 Nov 2008 08:14:06 -0300

```Dear Mona, Maarten and Statalisters,

reading Maarten's answer, I would like to ask if this procedure is correct:

******
" // creating predictions while keeping other variables constant
// predicted probabilities of urban women of average age
preserve
sum age if e(sample), meanonly
replace age = r(mean)
replace female = 1
replace rural = 0

predict pra*, pr
table race , c(m pra1 m pra2 m pra3 m pra4 m pra5)

restore"
***************
because the value of r(mean) (sample) is different of svy: mean (population):

webuse nhanes2f, clear
svyset psuid [pweight=finalwgt], strata(stratid)
. svy: mean age
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =      31       Number of obs    =      10337
Number of PSUs   =      62       Population size  =  117023659
Design df        =         31

--------------------------------------------------------------
|             Linearized
|       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
age |   42.23732   .3034412      41.61844    42.85619
--------------------------------------------------------------

. sum age if e(sample)

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
age |     10337     47.5637    17.21678         20         74

If I am using svy: mlogit, the mean to be used isn´t the populational?

Thanks a lot,

Best Regards,

Joao Lima

2008/11/16 Maarten buis <maartenbuis@yahoo.co.uk>:
> --- Mona Mowafi <mmowafi@hsph.harvard.edu> wrote:
>> I am seeking to attain predicted probabilities of my outcome (BMI
>> cats - normal, overweight, obese) for four main independent
>> variables.  I am not sure how to do it, but here is what I have
>> tried:
>>
>> svyset [pweight=femaleweight], strata(order) psu(place)
>>
>> xi: svymlogit BMICAT i.AGECAT4 i.ED2 i.WB_pov i.ASSET_INDEX
>> i.PCAwealthindex i.FATHERED i.GENHEALTH_PAST, basecategory(2) nolog
>> svymlogit, rrr
>>
>> predict p1 p2 p3
>> sort ED2
>> by ED2: sum p1
>> by ED2: sum p2
>> by ED2: sum p3
>>
>> Here are my main questions:
>>
>> 1) Does this syntax, does p1 refer to my reference outcome = normal
>> weight; p2= overweight, p3 = obese?  I want to make sure that I am
>> interpreting what p1, p2, and p3 is properly.
>
> You can see what category the variables refer to by looking at the
> labels that -predict- has attached to them. You can see those by typing
> -desc p*- (which will describe all variables whose name start with p,
> if there are too many of those type -desc p1 p2 p3-).
>
>> 2) If I sort and sum by p1, p2, and p3 - is this giving me the mean
>> predicted probability of each of my three outcomes for all
>> individuals in each of those three sub-categories (of education, for
>> example, as seen above)?  That is what I'm trying to do.
>
> Yes, but there is a subtle issue here: the differences between the
> educational categories may be due to the effect of education but can
> also be due to differences between the educational categories in the
> distribution of the other explanatory variables. For instance the lower
> educational categories will consist of individuals from a lower social
> background and these tend to have , and these tend a higher BMI. You
> can keep the other variables constant by first replacing the other
> variables by some number, e.g. the mean, and than predict, and than
> make the tables.
>
> Both methods are illustrated below (I used -table- in this examples as
> it creates more compact tables, but -by ...: sum...- will work too,
> another alternative would be -tabstat-).
>
> *---------------------- begin example ---------------------
> webuse nhanes2f, clear
> svyset psuid [pweight=finalwgt], strata(stratid)
> tab health
> svy: mlogit health rural black orace sex age
>
> // create predictions without keeping other variables constant
> predict pr*, pr
>
> // the labels show which variable belongs to which category
> desc pr*
>
> // comparing the average predicted probabilities with the observed
> percentages
> sum pr*
> tab health
>
> table race , c(m pr1 m pr2 m pr3 m pr4 m pr5)
>
>
> // creating predictions while keeping other variables constant
> // predicted probabilities of urban women of average age
> preserve
> sum age if e(sample), meanonly
> replace age = r(mean)
> replace female = 1
> replace rural = 0
>
> predict pra*, pr
> table race , c(m pra1 m pra2 m pra3 m pra4 m pra5)
>
> restore
> *--------------------- end example -------------------
> (For more on how to use examples I sent to the Statalist, see
> http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
>
> Hope this helps,
> Maarten
>
> -----------------------------------------
> Maarten L. Buis
> Department of Social Research Methodology
> Vrije Universiteit Amsterdam
> Boelelaan 1081
> 1081 HV Amsterdam
> The Netherlands
>
> Buitenveldertselaan 3 (Metropolitan), room N515
>
> +31 20 5986715
>
> http://home.fsw.vu.nl/m.buis/
> -----------------------------------------
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
-------------------------------
Joao Ricardo Lima
Professor
UFPB-CCA-DCFS
+553138923914
-------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```