Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Scatter with regression line and confidence interval densities


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: Scatter with regression line and confidence interval densities
Date   Fri, 16 Jul 2004 16:39:13 +0100

Completely orthogonal to Vince's very nice code 
is a comment on the much analysed relationship
between mpg and weight. A comment made in many places 
is that in many ways the reciprocal scale, gpm = 
1 / mpg, is a more natural scale for analysis, 
on elementary physical grounds. This is usually 
followed by a transformation to -gpm- and a linear
regression. A comment made less often is that -glm- with 
a reciprocal link offers another way to do it. 

So, 

sysuse auto
gen weight2 = weight^2
reg mpg weight weight2
predict p_quad
glm mpg weight , link(power -1)
predict p_glmrec
scatter mpg wei || mspline p_quad wei, bands(200) || 
	mspline p_glm wei, bands(200) 

-- and fortunately, or fortuitously, or both, 
you can see that the two predictions are 
essentially identical. The -glm- prediction
would, however, extrapolate rather better
and it is easier to entertain different error
families. 

Just a thought, as Marcello might say. 

Nick 
[email protected] 

Vince Wiggins,
 
> Scott Merryman <[email protected]> wrote, 
> 
> > In the June 2004 issue of the American Economic Review, the back
> > cover has an ad from Stata emphasizing the graphics of Stata 8.  One
> > of the graphs shows a scatter plot with a regression line and
> > confidence interval densities.  It looks something like the graph on
> > page 2 of
> >
> > http://www.asft.ttu.edu/ansc5403/lecture25.pdf
> >
> > How does one include the confidence densities in a 
> regression line graph?
> 
> This graph superimposes vertical density line plots for the 
> distribution of
> the disturbances on a regression line.  Such graphs are 
> sometimes seen in
> textbooks when trying to provide intuition for linear 
> regression.  For data
> analysis, the confidence intervals shown by -twoway lfitci y 
> x- are easier to
> read, but the graph from the ad has its own appeal.  Here is 
> the code used to
> produce that graph,
> 
> ---------------------------------- BEGIN --- regline_ci.do 
> --- CUT HERE -------
> clear
> sysuse auto
> keep if foreign
> sort weight
> 
> gen weight2 = weight^2
> regress mpg weight weight2
> predict fit
> predict se , stdp
> 
> #delimit ;
> twoway sc mpg weight , pstyle(p3) ms(o)			
> 		||
>        fn weight[3]  - 1000 * normden(x, `=fit[3]' , `=se[3]') , 
> 		range(`=fit[3] -5' `=fit[3] +5') horiz 
> pstyle(p1)	||
>        fn `=fit[3]' , range(`=weight[3]' 
> `=weight[3]-1000*normden(0, se[3])')
>        		      pstyle(p1)			
> 		||
>        fn weight[17] - 1000 * normden(x, `=fit[17]', `=se[17]') , 
>        		range(`=fit[17]-5' `=fit[17]+5') horiz 
> pstyle(p1)	||
>        fn `=fit[17]', range(`=weight[17]' 
> `=weight[17]-1000*normden(0, se[17])')
>        		      pstyle(p1)			
> 		||
>        fn weight[21] - 1000 * normden(x, `=fit[21]' , `=se[21]') , 
>        		range(`=fit[21] -7' `=fit[21] +7') 
> horiz pstyle(p1)	||
>        fn `=fit[21]', range(`=weight[21]' 
> `=weight[21]-1000*normden(0, se[21])')
>        		      pstyle(p1)			
> 		||
>        line fit weight
> 	, clwidth(*2) legend(off) ytitle(Miles per gallon) 
> xtitle(Weight)
> 	  title("Scatter with Regression Line and Confidence 
> Interval Densities"
> 	  , size(4.8) margin(t=0 b=1.5) span)
> ;
> #delimit cr
> ----------------------------------   END --- regline_ci.do 
> --- CUT HERE -------
> 
> The graph is cute in that the CI densities are not notional, 
> but rather the
> actual CIs from our regression of -mpg- on -weight- and 
> -weight- squared.  We
> have pulled the SE estimates from the regression fit, SEs 
> obtained with
> -predict se , stdp-, at observations 3, 17, and 21 and 
> supplied those to the
> -fn- (or -function-) plots using the -normden()- function to 
> get our CI lines
> (we cheated ever so slightly and did not use a 
> t-distribution).  Note that we
> scale the result of -normden()- by 1000 so that it looks 
> about right on the
> scale of the weight axis -- a scale that runs from 1,500 to 
> 3,500.  We need to
> do this because the X-axis is not scaled as a density.  Our 
> choice of 1000 as
> the scaling is arbitrary -- we can only compare the relative 
> heights of the CI
> densities on this graph.  We also took some care to get an 
> appropriate range
> in the -mpg- dimension for each of our CI densities.
> 
> The other three -fn- plots just draw the drop lines from the 
> top of the CI
> densities to the regression line.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index