Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Scatter with regression line and confidence interval densities


From   [email protected] (Vince Wiggins, StataCorp)
To   [email protected]
Subject   Re: st: Scatter with regression line and confidence interval densities
Date   Fri, 16 Jul 2004 09:06:15 -0500

Scott Merryman <[email protected]> wrote, 

> In the June 2004 issue of the American Economic Review, the back
> cover has an ad from Stata emphasizing the graphics of Stata 8.  One
> of the graphs shows a scatter plot with a regression line and
> confidence interval densities.  It looks something like the graph on
> page 2 of
>
> http://www.asft.ttu.edu/ansc5403/lecture25.pdf
>
> How does one include the confidence densities in a regression line graph?

This graph superimposes vertical density line plots for the distribution of
the disturbances on a regression line.  Such graphs are sometimes seen in
textbooks when trying to provide intuition for linear regression.  For data
analysis, the confidence intervals shown by -twoway lfitci y x- are easier to
read, but the graph from the ad has its own appeal.  Here is the code used to
produce that graph,

---------------------------------- BEGIN --- regline_ci.do --- CUT HERE -------
clear
sysuse auto
keep if foreign
sort weight

gen weight2 = weight^2
regress mpg weight weight2
predict fit
predict se , stdp

#delimit ;
twoway sc mpg weight , pstyle(p3) ms(o)					||
       fn weight[3]  - 1000 * normden(x, `=fit[3]' , `=se[3]') , 
		range(`=fit[3] -5' `=fit[3] +5') horiz pstyle(p1)	||
       fn `=fit[3]' , range(`=weight[3]' `=weight[3]-1000*normden(0, se[3])')
       		      pstyle(p1)					||
       fn weight[17] - 1000 * normden(x, `=fit[17]', `=se[17]') , 
       		range(`=fit[17]-5' `=fit[17]+5') horiz pstyle(p1)	||
       fn `=fit[17]', range(`=weight[17]' `=weight[17]-1000*normden(0, se[17])')
       		      pstyle(p1)					||
       fn weight[21] - 1000 * normden(x, `=fit[21]' , `=se[21]') , 
       		range(`=fit[21] -7' `=fit[21] +7') horiz pstyle(p1)	||
       fn `=fit[21]', range(`=weight[21]' `=weight[21]-1000*normden(0, se[21])')
       		      pstyle(p1)					||
       line fit weight
	, clwidth(*2) legend(off) ytitle(Miles per gallon) xtitle(Weight)
	  title("Scatter with Regression Line and Confidence Interval Densities"
	  , size(4.8) margin(t=0 b=1.5) span)
;
#delimit cr
----------------------------------   END --- regline_ci.do --- CUT HERE -------

The graph is cute in that the CI densities are not notional, but rather the
actual CIs from our regression of -mpg- on -weight- and -weight- squared.  We
have pulled the SE estimates from the regression fit, SEs obtained with
-predict se , stdp-, at observations 3, 17, and 21 and supplied those to the
-fn- (or -function-) plots using the -normden()- function to get our CI lines
(we cheated ever so slightly and did not use a t-distribution).  Note that we
scale the result of -normden()- by 1000 so that it looks about right on the
scale of the weight axis -- a scale that runs from 1,500 to 3,500.  We need to
do this because the X-axis is not scaled as a density.  Our choice of 1000 as
the scaling is arbitrary -- we can only compare the relative heights of the CI
densities on this graph.  We also took some care to get an appropriate range
in the -mpg- dimension for each of our CI densities.

The other three -fn- plots just draw the drop lines from the top of the CI
densities to the regression line.


-- Vince
   [email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index