  RE: st: graph mean and sd over/by category

 From Lee Sieswerda To "'statalist@hsphsun2.harvard.edu'" Subject RE: st: graph mean and sd over/by category Date Mon, 10 Feb 2003 18:51:46 -0500

```I was writing a reply to George Hoffman's question, when Vince Wiggins reply
popped into my mail box. I have a question for Vince. First, here was what I
was suggesting for George.

bysort foreign: egen mean = mean(weight)
bysort foreign: egen sd = sd(weight)
bysort foreign: gen ub = mean + invttail(_N-1,.025)*(sqrt((sd^2)/_N))
bysort foreign: gen lb = mean - invttail(_N-1,.025)*(sqrt((sd^2)/_N))
twoway (rcap lb ub foreign) (scatter mean foreign)

This gives the same results as -ci-. Specifically, it gives a 95% CI with
the t critical value based on _N-1 observations within strata (of foreign in
this case). It is -ci-'s results that I gather Nick Cox is using to produce
his new -ciplot- (pardon me Nick, if I'm misrepresenting you).

intriguing. Vince, if I'm understanding your post correctly, I could obtain
a 95% CI for the mean of weight by foreign like so:

regress weight foreign
predictnl yhat=predict(), ci(lb ub)

When I do so, I get the following upper and lower bounds:

. bysort foreign: sum ub lb

____________________________________________________________________________
___
-> foreign = Domestic

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
ub |        52    3491.338           0   3491.338   3491.338
lb |        52    3142.893           0   3142.893   3142.893

____________________________________________________________________________
___
-> foreign = Foreign

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
ub |        22     2583.76           0    2583.76    2583.76
lb |        22    2048.058           0   2048.058   2048.058

These differ from what -ci- produces:

. ci weight, by(foreign)

____________________________________________________________________________
___
-> foreign = Domestic

Variable |        Obs        Mean    Std. Err.       [95% Conf.
Interval]
-------------+--------------------------------------------------------------
-
weight |         52    3317.115     96.4296        3123.525
3510.706

____________________________________________________________________________
___
-> foreign = Foreign

Variable |        Obs        Mean    Std. Err.       [95% Conf.
Interval]
-------------+--------------------------------------------------------------
-
weight |         22    2315.909    92.31665        2123.926
2507.892

Now, I gather that the difference is a results of this message that I

. predictnl yhat=predict(), ci(lb ub)
note: Confidence intervals calculated using t(72) critical values.

So, here is the dumb question. For what George is looking for (and many
others I'm sure), should a person be using t critical values based on the
total sample (_N-2=72), or based on the sample within strata (_N-1=21 and
_N-1=51)?

Thanks (and sorry for the long posting),

Lee

Lee Sieswerda, Epidemiologist
Thunder Bay District Health Unit
999 Balmoral Street
Thunder Bay, Ontario
Tel: +1 (807) 625-5957
Fax: +1 (807) 623-2369
Lee.Sieswerda@tbdhu.com
www.tbdhu.com

> -----Original Message-----
> From:	vwiggins@stata.com [SMTP:vwiggins@stata.com]
> Sent:	Monday, February 10, 2003 5:55 PM
> To:	statalist@hsphsun2.harvard.edu
> Subject:	Re: st: graph mean and sd over/by category
>
> Among other things, George Hoffman <ghoffman@mcw.edu> asks,
>
> > [...] fitted curves under scatter plots look beautiful - can the
> > regression coefficients from fplotci or qplotci be captured somehow,
> > as poor-man's curve fit?
>
> I think George is referring to the -fpfitci- and -qfitci- plot types of
> -graph twoway-.  If so, he can readily perform the regressions that
> produced
> the graphs.
>
> -qfitci- just performs a quadratic regression.  If we use the auto data,
> -sysuse auto-, the lines for the graph command,
>
>       . twoway qfitci mpg weight
>
> are the predictions of the quadratic fit,
>
>       . gen weight2 = weight^2
>       . regress mpg weight weight2
>
> The coefficients can be seen in the output of -regress-, or manipulated in
> the
> usual way through the saved results.
>
> If George wants to add the predictions, and their CIs to his dataset, he
> can
> type,
>
>       . predictnl mpg_hat2 = predict() , ci(ci_low ci_high)
>
>
> This is a very simple application of -predictnl-, Bobby Gutierrez
> <rgutierrez@stata.com> said more in a prior post, but it lets us get both
> the
> predictions and their CIs with one command.
>
> We could then get a graph similar to our earlier -twoway qfitci-, by
> typing,
>
>       . twoway rarea ci_low ci_high weight, sort || line mpg_hat2 weight,
> sort
>
> which we will immediately think looks ugly and decide to relabel the CI in
> the
> legend, option -legend(label())-, and change the fill color of the CI to
> be
> the standard for our scheme, option -p(ci)-.
>
>       .  twoway rarea ci_low ci_high weight , sort p(ci) ||
>                 line mpg_hat2 weight , sort legend(label(1 "CI"))
>
>
> The -fpfitci- plot type just uses -fracpoly- as the engine to produce the
> fits, much like -regress- is used for the quadratic fit.  For our example,
> the
> corresponding -fracpoly- estimation command is,
>
>       . fracpoly regress mpg weight
>
> and we can repeat the rest of the story, or just use -fracplot-, to plot
> the
> fit and CI.
>
>
> -- Vince
>    vwiggins@stata.com
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```