Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: mvrs: out-of-sample prediction/definition of the splines

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: mvrs: out-of-sample prediction/definition of the splines
Date	Wed, 5 Dec 2012 17:47:02 +0000

This refers to -mvrs- by Patrick Royston.

SJ-7-1  st0120  . Multivar. modeling with cubic reg. splines: A prin. approach
         (help mvrs, uvrs, splinegen if installed)  P. Royston and W. Sauerbrei
         Q1/07   SJ 7(1):45--70
         discusses how to limit instability and provide sensible
         regression models when using spline functions in a
         multivariable setting

mvrs from http://www.homepages.ucl.ac.uk/~ucakjpr/stata
     mvrs. Package for univariate and multivariable regression spline modelling

    / Programs by Patrick Royston. / Distribution-Date: 20121205 / version:

   2.0.0 (uvrs), 2.0.0 (mvrs), 1.2.2 (splinegen) / Please direct queries to
    Patrick Royston ([email protected])

Please remember to explain _where_ user-written packages you refer to
 come from.

Patrick Royston is not a member of Statalist, but I forwarded this to him.

He writes

% begin Patrick R

In fact, what -mvrs- is doing is to create spline basis variables
 correctly when -all- is specified. It automatically orthogonalizes
 them so that the mean of each is 0 and the variance is 1 and the
 covariances between them are 0. With the -all- option, knots are
 determined only from the estimation sample (training==1 in your
 example) and are applied to all observations when basis functions are
 calculated. The orthogonalization is effectively just a linear
 transformation. This transformation should not affect the predicted
 values from regression analysis on the basis functions calculated in
 the out-of-sample part of the data.

However, if you are sceptical, you can include the -noorthog- option
 in -mvrs- (the default is -orthog-). This option was previously
 undocumented, but below are some details as an addition to the help
 file. Please try out your example both ways, with and without
 orthogonalization.

-orthog- creates orthogonalized spline basis functions. After
 orthogonalization, all the basis functions are uncorrelated and have
 mean 0 and SD 1. The default is to create orthogonalized basis
 functions.  -noorthog- produces non-orthogonalized basis functions.
 They are typically highly correlated, possibly resulting in numerical
 instability when fitting the model.

Updated help files mvrs.sthlp and splinegen.sthlp are now on my UCL webpage.

% end Patrick R

Patrick Miller <[email protected]>

> I want to use a mvrs model for out-of-sample prediction but
 > unfortunately I have some trouble with the option "all".
 >
 > I have devided my data in training- and test-sample by using a binary
 > variable train. To build a model (without option "all") I use:
 >
 > (1) mvrs regress y x1 x2 x3 if train==1, degree(3)
 >
 > Let x1 be continuous and a spline transformation with two knots is
 > done. Hence new variables x1_0, x1_1 and x1_2 are generated.
 >
 > If I use the samle model but with option "all":
 >
 > (2) mvrs regress y x1 x2 x2 if train==1, all degree(3)
 >
 > x1_0, x1_1 and x1_2 are generate as well, but the stored values for
 > the training-sample differ from the ones generated by model (1).
 >
 > My interpretation is that the transformation for the test-sample is
 > not only done by the information provided by the training-sample. In
 > fact for the transformation training- and test-data are used. In my
 > opinion this is not a correct way of out-of-sample testing.
 >
 > Is there any way to generate x1_0, x1_1 and x1_2 for the test-sample
 > only based on the information of the training-sample?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: mvrs: out-of-sample prediction/definition of the splines
  - From: "Patrick Miller" <[email protected]>

Prev by Date: Re: st: generate a column of a summary statistic conditioning on the comparison of the values in two other columns
Next by Date: Re: st: generate a column of a summary statistic conditioning on the comparison of the values in two other columns
Previous by thread: st: mvrs: out-of-sample prediction/definition of the splines
Next by thread: st: generate a column of a summary statistic conditioning on the comparison of the values in two other columns
Index(es):
- Date
- Thread