Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: How Do I Plot a Serset?


From   "German Rodriguez" <grodri@princeton.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: How Do I Plot a Serset?
Date   Fri, 5 Sep 2003 14:23:38 -0400

Thanks very much to Roger Newson and Nick Cox for their replies on
plotting sersets. Short answer is you can't. But there are workarounds.

The problem restated: I have evaluated a smooth function f(x) at
selected x's and want to plot it over a finer grid of x's using spline
interpolation, they way c(s) used to do it. The function is too
complicated to use graph twoway function.
 
Roger said:
 
> I think that what German really wants is a variable containing 
> a spline to plot on a graph. 

Well, yes; but not quite. I had thought of going that route, but I
really, really, want the spline on a serset, not on my dataset. Let me
explain why.

If you are trying to fit a smooth regression of y on x then B-splines
are the way to go. You can plot the data and superimpose the spline
using version 8's wonderful facilities for overlaying plots.  This will
work fine as long as the x's are closely spaced. If you have gaps,
however, the plot will not look smooth because consecutive points will
be joined using straight lines, not curves. Of course you can always
generate extra x's to fill the gaps and then predict to get the
corresponding y's. Or use c(s).
 
My application involved a Box-Cox transformation. I have a dataset with
20 observations. I computed the Box-Cox log likelihood for values
-2(0.5)1 of the transformation parameter. This used to be enough to
obtain a smooth plot using c(s). The B-spline route would require me to
fit a spline to the seven points, grow the dataset perhaps to 100
observations, generate a finer grid, and then predict. This seems quite
a bit of work to replace c(s). (Not to mention the fact that the plot in
question used to be an option of the boxcox command in version 6, but
that's another story.)
 
Nick said:

> . -c(s)- has not been removed, but has a new identity
> as -twoway mspline-.

This is interesting. I had dismissed mspline because it first computes
cross-medians as a smoothing device and then computes an interpolating
cubic spline. But one can fool mspline into doing just the c(s) part by
defining exactly as many bands as one has points. In my application I
can use twoway mspline logL lambda, bands(7). It is imperative to get
the number of bands exactly right so the medians coincide with the
evaluation points, otherwise the function would be distorted (which
makes this workaround brittle). If you don't believe me, try changing
the number of bands to 3 in the example below. (This example can also be
done with twoway function y=(x-2)^2, but serves to illustrate the
point.)

clear
set obs 5
gen x = _n
gen y = (x-2)^2
twoway (scatter y x) (mspline y x, bands(5)) 

Nick goes on to say:

>I suspect what is behind this move is logic.
>-c(s)- is not a purely presentational detail like
>(in similar oldspeak) say -c(l)- or -c(J)-. The
>computation of cross-medians and the cubic spline
>interpolation bring in some data analysis. It really
>belongs in a room of its own, just as -lowess-

I beg to differ a bit. I think the computation of a log-likelihood,
cross-medians, lowess, or B-splines is the data analytic part, leading
to x-y pairs. If I am doing data analysis I like the flexibility of
choosing the smoothing device. Then you have the problem of joining
those points in a plot using smooth curvilinear (rather than straight
line) segments. This I view as purely presentational and is the problem
solved rather nicely by c(s). I also believe that the finer grid used
solely for plotting purposes belongs naturally on a (temporary) serset,
not on your main dataset.

Also from Roger:
 
> Vince Wiggins wrote:
>     ...[line breaks changed] ...
> There is nothing really hidden about sersets.  What you see in [P] is 
> what you get.  When [P] sersets says they are used by the "internals" 
> of graphics, it really means internals.  These internals are not
anything
> currently of interest to users.

Which is a shame. There's a lot one could do by being able to plot a
user-defined serset.

But my problem is solved. Thanks again.

Cheers,
Germán Rodríguez


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index