Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Using Spline Results from one Dataset to Estimate Values for a Second Dataset

 From Maarten Buis To statalist@hsphsun2.harvard.edu Subject Re: st: Using Spline Results from one Dataset to Estimate Values for a Second Dataset Date Wed, 14 Mar 2012 09:17:03 +0100

```On Tue, Mar 13, 2012 at 8:54 PM, NEPPC RA wrote:
> I am using mkspline to derive a cost function for per capita
> expenditure in a certain subset of my data, let's call it subset "A".
> I then want to use the results from that regression to estimate values
> for a completely different subset. Usually I can simply regress on the
> variables in question and then specify predict XYZ. However, since
> mkspline creates a series of variables only for subset "A", I can't
> create predictions for the second subset. Any suggestions?

Here is an example of how I would do that. The trick is that if you
use the same knots you will get the same splines. So if you store the
knot locations used in the first subset and use those to create
splines in the second subset than you will have the correct variables
for "predicting" in the second subset.

*------------------ begin example --------------------
// some data preparation
sysuse nlsw88, clear
gen byte marst = never_married + 2*married
label define marst 0 "divorced/widowed" ///
1 "never married"    ///
2 "married"
label value marst marst

// make 3 cubic splines  with knots equally
// spaced or at percentiles of data
mkspline sp = ttl_exp if race == 1, cubic nknots(3)

// the knots are stored in the matrix r(knots)
matlist r(knots)

// store the knot locations for later use
tempname k1 k2 k3 mat
matrix `mat' = r(knots)
scalar `k1' = el(`mat',1,1)
scalar `k2' = el(`mat',1,2)
scalar `k3' = el(`mat',1,3)

// estimate model for white only
glm wage sp? union i.marst grade if race == 1, ///

// create splines with knots of white for everybody
drop sp?
mkspline sp = ttl_exp, cubic ///
knots( `=`k1'' `=`k2'' `=`k3'')

// predict for everybody
predict wagehat, mu
*------------------- end example ---------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

One tricky thing is the -knots()- option. I extracted the knot
positions from the matrix of knots returned by -mkspline- and stored
them in scalars, as that is what the -el()- function expects. I have
given those scalars temporary names, so there is no conflict with
variable names (If this cryptic comment interest you, than you can
read more in the manual entry for -scalar-). So the scalars are called
`k1', etc., that is including the -`- and the -'- . The -knots()-
option however wants numbers not scalar names, so -knots(`k1' `k2'
`k3')- will result in an error message. You can force Stata to first
evaluate a statement, in case of scalars get the number corresponding
to the scalar name, by surrounding the scalar name with -`=- and -'-,
so `=`k1'' is the number corresponding with the first knot. This is
why I specified the -knots()- options as -knots( `=`k1'' `=`k2''
`=`k3'')-.

Hope this helps,
Maarten

Ps. You can use this trick with -regress- and -predict- (without the
-mu- option). I just like using -glm- with the -link(log)- option to
predict wage rather than using -regress- on log(wage) for reasons
discussed in:

Nicholas J. Cox, Jeff Warburton, Alona Armstrong, Victoria J. Holliday
(2007) "Fitting concentration and load rating curves with generalized
linear models" Earth Surface Processes and Landforms, 33(1):25--39.
<http://dx.doi.org/10.1002/esp.1523>

and:
<http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/>

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```