Kaleb Michaud <[email protected]> asks:
Is wondering about discrepencies he sees between the output of
-adjust- and -tabstat- after -logistic-.  Using the census data
shipped with Stata he runs
    . gen longer = length(state)
    . recode longer min/8=0 9/max=1
    . logistic longer medage
    . predict p1 if e(sample), pr
    . adjust, by(region) pr gen(a1)
    ----------------------------------------------------------
          Dependent variable: longer     Command: logistic
            Created variable: a1
         Variable left as is: medage
    ----------------------------------------------------------
    ----------------------
    Census    |
    region    |         pr
    ----------+-----------
           NE |    .433217
      N Cntrl |     .37812
        South |    .381077
         West |    .339917
    ----------------------
    Key:  pr  =  Probability
    . tabstat p1 a1 medage, by(region)
    Summary statistics: mean
      by categories of: region (Census region)
     region |        p1        a1    medage
    --------+------------------------------
         NE |  .4335186  .4335186  31.23333
    N Cntrl |  .3783487  .3783487    29.525
      South |  .3821012  .3821012  29.61875
       West |  .3418868  .3418868  28.28462
    --------+------------------------------
      Total |       .38       .38     29.54
    ---------------------------------------
And wonders if there is a problem since the table produced by
-adjust- shows values such as .433217 instead of .4335186 (for
the NE region, as an example).
I can explain what is happening, but first here is some
background.  For logistic regression we go from the linear
prediction (the -xb- option of -predict-) to probabilities using
the formula
        exp(x)/(1+exp(x))
What you are seeing in the output of -tabstat- is
       average( exp(x)/(1+exp(x)) )
for each region.  What you are seeing in -adjust- is
        exp(average(x))/(1+exp(average(x)))
for each region. 
In other words, do you want to see the average of the
probabilites, or the probabilities for the average?  Depending on
which you want, you will want to use the output from -tabstat- or
-adjust-.
Just to make it more concrete, lets examine how the numbers for
NE region (.4335186 shown by -tabstat- and .433217 shown by
-adjust-) can be produced.  Here is output from Stata 8 (notice
my use of the -sysuse- command which was added in 8).
    . sysuse census, clear
    . gen longer = length(state)
    . recode longer min/8=0 9/max=1
    . quietly logistic longer medage
    . predict double p1 , pr
    . predict double xb, xb
    . summarize p1 if region==1
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
              p1 |         9    .4335186    .0333675    .374191   .4652383
The above is the mean of the probabilities and agrees with what
-tabstat- shows.
    . summarize xb if region==1
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
              xb |         9   -.2687384    .1370751  -.5142789  -.1392714
    . di r(mean)
    -.26873836
The -.26873836 is the mean of the linear predictions.  The corresponding
probability for the mean prediction is
    . di exp(r(mean))/(1+exp(r(mean)))
    .43321685
Which is what -adjust- is producing.
I hope this clarifies the situation for you.
Ken Higbee    [email protected]
StataCorp     1-800-STATAPC
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/