Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Using Spline Results from one Dataset to Estimate Values for a Second Dataset |

Date |
Wed, 14 Mar 2012 09:17:03 +0100 |

On Tue, Mar 13, 2012 at 8:54 PM, NEPPC RA wrote: > I am using mkspline to derive a cost function for per capita > expenditure in a certain subset of my data, let's call it subset "A". > I then want to use the results from that regression to estimate values > for a completely different subset. Usually I can simply regress on the > variables in question and then specify predict XYZ. However, since > mkspline creates a series of variables only for subset "A", I can't > create predictions for the second subset. Any suggestions? Here is an example of how I would do that. The trick is that if you use the same knots you will get the same splines. So if you store the knot locations used in the first subset and use those to create splines in the second subset than you will have the correct variables for "predicting" in the second subset. *------------------ begin example -------------------- // some data preparation sysuse nlsw88, clear gen byte marst = never_married + 2*married label define marst 0 "divorced/widowed" /// 1 "never married" /// 2 "married" label value marst marst // make 3 cubic splines with knots equally // spaced or at percentiles of data mkspline sp = ttl_exp if race == 1, cubic nknots(3) // the knots are stored in the matrix r(knots) matlist r(knots) // store the knot locations for later use tempname k1 k2 k3 mat matrix `mat' = r(knots) scalar `k1' = el(`mat',1,1) scalar `k2' = el(`mat',1,2) scalar `k3' = el(`mat',1,3) // estimate model for white only glm wage sp? union i.marst grade if race == 1, /// link(log) vce(robust) eform // create splines with knots of white for everybody drop sp? mkspline sp = ttl_exp, cubic /// knots( `=`k1'' `=`k2'' `=`k3'') // predict for everybody predict wagehat, mu *------------------- end example --------------------- (For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq ) One tricky thing is the -knots()- option. I extracted the knot positions from the matrix of knots returned by -mkspline- and stored them in scalars, as that is what the -el()- function expects. I have given those scalars temporary names, so there is no conflict with variable names (If this cryptic comment interest you, than you can read more in the manual entry for -scalar-). So the scalars are called `k1', etc., that is including the -`- and the -'- . The -knots()- option however wants numbers not scalar names, so -knots(`k1' `k2' `k3')- will result in an error message. You can force Stata to first evaluate a statement, in case of scalars get the number corresponding to the scalar name, by surrounding the scalar name with -`=- and -'-, so `=`k1'' is the number corresponding with the first knot. This is why I specified the -knots()- options as -knots( `=`k1'' `=`k2'' `=`k3'')-. Hope this helps, Maarten Ps. You can use this trick with -regress- and -predict- (without the -mu- option). I just like using -glm- with the -link(log)- option to predict wage rather than using -regress- on log(wage) for reasons discussed in: Nicholas J. Cox, Jeff Warburton, Alona Armstrong, Victoria J. Holliday (2007) "Fitting concentration and load rating curves with generalized linear models" Earth Surface Processes and Landforms, 33(1):25--39. <http://dx.doi.org/10.1002/esp.1523> and: <http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/> -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Using Spline Results from one Dataset to Estimate Values for a Second Dataset***From:*NEPPC RA <neppcra@gmail.com>

- Prev by Date:
**Re: st: Exporting tables using estpost and tabout** - Next by Date:
**st: update fmlogit available from SSC** - Previous by thread:
**st: Using Spline Results from one Dataset to Estimate Values for a Second Dataset** - Next by thread:
**st: goodness-of-fit McElroy's R-squared for NLSUR** - Index(es):