Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: mvrs: out-of-sample prediction/definition of the splines |

Date |
Wed, 5 Dec 2012 17:47:02 +0000 |

This refers to -mvrs- by Patrick Royston. SJ-7-1 st0120 . Multivar. modeling with cubic reg. splines: A prin. approach (help mvrs, uvrs, splinegen if installed) P. Royston and W. Sauerbrei Q1/07 SJ 7(1):45--70 discusses how to limit instability and provide sensible regression models when using spline functions in a multivariable setting mvrs from http://www.homepages.ucl.ac.uk/~ucakjpr/stata mvrs. Package for univariate and multivariable regression spline modelling / Programs by Patrick Royston. / Distribution-Date: 20121205 / version: 2.0.0 (uvrs), 2.0.0 (mvrs), 1.2.2 (splinegen) / Please direct queries to Patrick Royston (pr@ctu.mrc.ac.uk) Please remember to explain _where_ user-written packages you refer to come from. Patrick Royston is not a member of Statalist, but I forwarded this to him. He writes % begin Patrick R In fact, what -mvrs- is doing is to create spline basis variables correctly when -all- is specified. It automatically orthogonalizes them so that the mean of each is 0 and the variance is 1 and the covariances between them are 0. With the -all- option, knots are determined only from the estimation sample (training==1 in your example) and are applied to all observations when basis functions are calculated. The orthogonalization is effectively just a linear transformation. This transformation should not affect the predicted values from regression analysis on the basis functions calculated in the out-of-sample part of the data. However, if you are sceptical, you can include the -noorthog- option in -mvrs- (the default is -orthog-). This option was previously undocumented, but below are some details as an addition to the help file. Please try out your example both ways, with and without orthogonalization. -orthog- creates orthogonalized spline basis functions. After orthogonalization, all the basis functions are uncorrelated and have mean 0 and SD 1. The default is to create orthogonalized basis functions. -noorthog- produces non-orthogonalized basis functions. They are typically highly correlated, possibly resulting in numerical instability when fitting the model. Updated help files mvrs.sthlp and splinegen.sthlp are now on my UCL webpage. % end Patrick R Patrick Miller <P-Miller@gmx.de> > I want to use a mvrs model for out-of-sample prediction but > unfortunately I have some trouble with the option "all". > > I have devided my data in training- and test-sample by using a binary > variable train. To build a model (without option "all") I use: > > (1) mvrs regress y x1 x2 x3 if train==1, degree(3) > > Let x1 be continuous and a spline transformation with two knots is > done. Hence new variables x1_0, x1_1 and x1_2 are generated. > > If I use the samle model but with option "all": > > (2) mvrs regress y x1 x2 x2 if train==1, all degree(3) > > x1_0, x1_1 and x1_2 are generate as well, but the stored values for > the training-sample differ from the ones generated by model (1). > > My interpretation is that the transformation for the test-sample is > not only done by the information provided by the training-sample. In > fact for the transformation training- and test-data are used. In my > opinion this is not a correct way of out-of-sample testing. > > Is there any way to generate x1_0, x1_1 and x1_2 for the test-sample > only based on the information of the training-sample? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: mvrs: out-of-sample prediction/definition of the splines***From:*"Patrick Miller" <P-Miller@gmx.de>

- Prev by Date:
**Re: st: generate a column of a summary statistic conditioning on the comparison of the values in two other columns** - Next by Date:
**Re: st: generate a column of a summary statistic conditioning on the comparison of the values in two other columns** - Previous by thread:
**st: mvrs: out-of-sample prediction/definition of the splines** - Next by thread:
**st: generate a column of a summary statistic conditioning on the comparison of the values in two other columns** - Index(es):