[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: New and revised packages on SSC: -rhetplot-, -rvfplot2-, -rvpplot2- |

Date |
Thu, 3 Apr 2003 13:21:23 +0100 |

Thanks to Kit Baum, various new and revised packages have been placed on SSC. All are products of a ongoing project aimed at producing a suite of model diagnostic graphical routines to complement official Stata's -regdiag-. Slides from some talks at users group meetings reflecting different aspects of this project are at http://www.stata.com/support/meeting/8uk/diag.pdf http://www.stata.com/support/meeting/8uk/diag.html (London and Maastricht, May 2002; but graphics all Stata 7) http://fmwww.bc.edu/repec/nasug2003/CoxNASUG2003.pdf (Boston, March 2003; graphics Stata 8) To install or replace these packages, use -ssc-. revised: rvfplot2 ================= -rvfplot2- is offered as a generalisation of official Stata's -rvfplot- for residual vs fitted plots after fitting regression-type commands. -rvfplot2- has been rewritten for Stata 8. The previous version, which was written for Stata 7, remains in the package as -rvfplot27-. revised: rvpplot2 ================= A similar story: -rvpplot2- is offered as a generalisation of official Stata's -rvpplot- for residual vs predictor plots after fitting regression-type commands. -rvpplot2- has been rewritten for Stata 8. The previous version, which was written for Stata 7, remains in the package as -rvpplot27-. new: rhetplot ============= -rhetplot- (think "residual heteroscedasticity plot") is offered as a fairly general graphical tool for checking for heteroscedasticity of errors. Stata 8 is required. Here is the rhetoric behind -rhetplot-, more discursively than in the help file. Homoscedastic errors are commonly assumed in many model fits; checking residuals for heteroscedasticity is thus advisable. Graphically this can be done in various ways, including -rvfplot- (or -rvfplot2-, above) or -rdplot- (from SSC). -rhetplot- is another way to do it. The generic idea is this: divide the data into subsets, calculate the standard deviation of residuals in each subset and plot the standard deviations to see if they are similar or different (and if different, if there is some collective pattern). Sometimes the division into subsets is naturally determined. Suppose you . webuse systolic . anova systolic drug disease Here the question is whether errors have similar variability in cells defined by combinations of -drug- and -disease-, and this is the way to do it: . rhetplot, by(drug disease) The graph shown has sd of residuals on the y axis and the results of -egen <tempvar> = group(drug disease), label- on the x axis. In this case, to get the benefit of the labelling, you need to add . rhetplot, by(drug disease) xlabel(1/12, valuelabel) Other times any division into subsets is at least a little arbitrary. The handles provided in -rhetplot- are those provided in -egen, cut()-, namely its -at()- and -group()- options, although I typically use only the latter. Suppose I . sysuse auto . regress turn length I might want to slice -length- into quantile-based groups. To see some detail, but not too much, use the magical number seven, plus or minus two (http://psychclassics.yorku.ca/Miller/): . rhetplot length, group(7) Strictly, what appears on the x axis in this case are the means of groups of -length-. You don't need to specify a variable (and if you have more than one covariate, there may not be an obvious choice in any case). Given . rhetplot, group(7) the slices are of the fitted values, producing in essence the same plot in this case. So this choice is like taking a residual vs fitted plot, slicing it vertically and plotting the sd of residuals in each slice against the mean of the fitted values in each slice. The graph shown in each case is a call to -lowess-, so unless the number of groups is very small, you get -lowess-'s idea of a smooth. This can be helpful informally for getting an idea of the structure of variability. Suppose you . insheet using http://www.kgs.ku.edu/Mathgeo/Books/Stat/ASCII/OCS.TXT, clear That's some data which I'll explain by . label data "petroleum reservoirs, Outer Continental Shelf, Texas and Louisiana" . label var mmboe "ultimate production, million barrels oil equivalent" . label var area "area of closure, acres" A simple-minded . regress mmboe area followed by . rhetplot area, g(7) shows a clear tendency to heteroscedasticity, and eyeballing suggests that sd of residuals is approximately proportional to mean of fitted. This is made clearer by superimposing a line through the origin . rhetplot area, g(7) plot(function y = (72/19000) * x, range(0 19000)) As is well known, sd / mean = constant points straight towards an analysis on the logarithmic scale. Of course, a little thought or experience with similar size data might have suggested that in the first place. Anyway, note that here the -plot()- option comes free by courtesy of -lowess-. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Memory Truths by OS** - Next by Date:
**st: kernel density graphs (different x-axis)** - Previous by thread:
**st: Memory Truths by OS** - Next by thread:
**st: Decreasing returns** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |