Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: graph mean and sd over/by category


From   vwiggins@stata.com (Vince Wiggins, StataCorp)
To   statalist@hsphsun2.harvard.edu
Subject   Re: RE: st: graph mean and sd over/by category
Date   Tue, 11 Feb 2003 11:48:01 -0600

Lee Sieswerda <Lee.Sieswerda@tbdhu.com> took my suggestion to one of George
Hoffman's question in a completely surprising direction, at least to me.  I
was answering George's 2nd question, whereas Lee was answering George's 1st.
Even so, Lee found an interesting way to twist my answer toward the 1st
question.

Taking great liberty with Lee's response, he basically suggests using -ci- to
get the CIs for two different categories and graphing those along with the
original data using -twoway-.  (Lee actually used -egen- but notes that the
results are the same as -ci-.).

Lee then picked up on my suggestion to use -predictnl- to get CIs for
INDIVIDUAL observations after a -regress-ion and then cleverly used an
indicator variable as the regressor so that those CIs would be the same for
all observations in a group.  He then compared the results of -ci- to those
from -predictnl- and found that they were different.

Using the auto data, Lee gets the following CIs using -predictnl- after
-regress-.

regress weight foreign
predictnl yhat=predict(), ci(lb ub)

. bysort foreign: sum ub lb

____________________________________________________________________________
-> foreign = Domestic

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          ub |        52    3491.338           0   3491.338   3491.338
          lb |        52    3142.893           0   3142.893   3142.893

____________________________________________________________________________
-> foreign = Foreign

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          ub |        22     2583.76           0    2583.76    2583.76
          lb |        22    2048.058           0   2048.058   2048.058


Lee notes that these are different from -ci-,

. ci weight, by(foreign)

____________________________________________________________________________
-> foreign = Domestic
    Variable |        Obs        Mean    Std. Err.       [95% Conf.  Interval]
-------------+--------------------------------------------------------------
      weight |         52    3317.115     96.4296        3123.525   3510.706

____________________________________________________________________________
-> foreign = Foreign
    Variable |        Obs        Mean    Std. Err.       [95% Conf.  Interval]
-------------+--------------------------------------------------------------
      weight |         22    2315.909    92.31665        2123.926   2507.892


Let me get the same results as, -predictnl- more directly by using foreign and
domestic indicator variables directly in -regress-.

. gen domestic = ! foreign
. regress weight for domestic, noconstant
[...]
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |   2315.909   134.3649    17.24   0.000     2048.058    2583.761
    domestic |   3317.115   87.39676    37.95   0.000     3142.893    3491.338
------------------------------------------------------------------------------

We see that the 95% CIs from regress match those from -predictnl- after
regress, as they should.  Now, however, it is easier to see why the CIs are
different.  -ci- with the -by()- option assumed independent samples for
domestic and foreign, one with 22 observations and one with 52 observations,
and it also assumed that two variances were to be estimated, one for domestic
and the other for foreign.  -regress-, on the other hand, assumed a single
variance was to be estimated and that variance had 72 degrees of freedom.  In
the parlance of regression, the -ci- estimates of variance allowed for
heteroskedasticity across the domestic and foreign groups, while -regress- did
not.  Basically, we make different assumptions when using -regress- than when
using -ci, by()-.


-- Vince
   vwiggins@stata.com

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index