# Re: RE: st: graph mean and sd over/by category

 From vwiggins@stata.com (Vince Wiggins, StataCorp) To statalist@hsphsun2.harvard.edu Subject Re: RE: st: graph mean and sd over/by category Date Tue, 11 Feb 2003 11:48:01 -0600

```Lee Sieswerda <Lee.Sieswerda@tbdhu.com> took my suggestion to one of George
Hoffman's question in a completely surprising direction, at least to me.  I
Even so, Lee found an interesting way to twist my answer toward the 1st
question.

Taking great liberty with Lee's response, he basically suggests using -ci- to
get the CIs for two different categories and graphing those along with the
original data using -twoway-.  (Lee actually used -egen- but notes that the
results are the same as -ci-.).

Lee then picked up on my suggestion to use -predictnl- to get CIs for
INDIVIDUAL observations after a -regress-ion and then cleverly used an
indicator variable as the regressor so that those CIs would be the same for
all observations in a group.  He then compared the results of -ci- to those
from -predictnl- and found that they were different.

Using the auto data, Lee gets the following CIs using -predictnl- after
-regress-.

regress weight foreign
predictnl yhat=predict(), ci(lb ub)

. bysort foreign: sum ub lb

____________________________________________________________________________
-> foreign = Domestic

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
ub |        52    3491.338           0   3491.338   3491.338
lb |        52    3142.893           0   3142.893   3142.893

____________________________________________________________________________
-> foreign = Foreign

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
ub |        22     2583.76           0    2583.76    2583.76
lb |        22    2048.058           0   2048.058   2048.058

Lee notes that these are different from -ci-,

. ci weight, by(foreign)

____________________________________________________________________________
-> foreign = Domestic
Variable |        Obs        Mean    Std. Err.       [95% Conf.  Interval]
-------------+--------------------------------------------------------------
weight |         52    3317.115     96.4296        3123.525   3510.706

____________________________________________________________________________
-> foreign = Foreign
Variable |        Obs        Mean    Std. Err.       [95% Conf.  Interval]
-------------+--------------------------------------------------------------
weight |         22    2315.909    92.31665        2123.926   2507.892

Let me get the same results as, -predictnl- more directly by using foreign and
domestic indicator variables directly in -regress-.

. gen domestic = ! foreign
. regress weight for domestic, noconstant
[...]
------------------------------------------------------------------------------
weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |   2315.909   134.3649    17.24   0.000     2048.058    2583.761
domestic |   3317.115   87.39676    37.95   0.000     3142.893    3491.338
------------------------------------------------------------------------------

We see that the 95% CIs from regress match those from -predictnl- after
regress, as they should.  Now, however, it is easier to see why the CIs are
different.  -ci- with the -by()- option assumed independent samples for
domestic and foreign, one with 22 observations and one with 52 observations,
and it also assumed that two variances were to be estimated, one for domestic
and the other for foreign.  -regress-, on the other hand, assumed a single
variance was to be estimated and that variance had 72 degrees of freedom.  In
the parlance of regression, the -ci- estimates of variance allowed for
heteroskedasticity across the domestic and foreign groups, while -regress- did
not.  Basically, we make different assumptions when using -regress- than when
using -ci, by()-.

-- Vince
vwiggins@stata.com

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```