Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: generating splines in variable with missing data and multiple imputation

 From Maarten Buis To statalist@hsphsun2.harvard.edu Subject Re: st: generating splines in variable with missing data and multiple imputation Date Tue, 26 Feb 2013 09:41:27 +0100

```The problem is not the standard error, but your effect: you have
imputed the values for pkyr assuming a linear effect, so the effect
you will get out of your final model will be biased towards a linear
effect. Passive imputation is technique that sounds plausible but is
controversial. The alternative that is often proposed is to create
your splines (and interactions, and polynomials, and ...) before
imputing the data and treat them as just another variable to be
imputed. Some say that this actually performs better than passive
imputation (Graham 2009, von Hippel 2009), and has the additional
advantage of being easy to implement.

Hope this helps,
Maarten

John Graham (2009) "Missing Data Analysis: Making it Work in the Real
World", Annual Review of Psychology, 60:549-576.

Paul von Hippel (2009) "How to impute interactions, squares, and other
transformed variables", Sociological Methodology, 39:265-291.

On Mon, Feb 25, 2013 at 9:52 PM, Deppen, Steve
<steve.deppen@vanderbilt.edu> wrote:
> I'm using Stata v12 and I have a small (492) dataset with missing data.  One of the variables, pack-years has a non-linear relationship to the outcome of cancer.  Pack-years is best modeled, given my limited degrees of freedom for other variables of interest, as a restricted cubic spline with 3 knots.  I'm missing data within pack years.  I can run:
>
> mkspline pkyr = pack_years, cubic nknots(3)
>
> after I generate my 20 imputed datasets.  However, I believe that my confidence interval may be incorrect.  I know in R, that variance inflation due to imputing the nonlinear variable is maintained using aregImpute and subsequent fit.mult.impute.  I afraid my standard errors are too small since I estimated the splines outside the imputation.  Is there a way to generate splines as a passive variable within the multiple imputation?
>
> Thank you,
>
>
> Stephen Deppen MA MS
> Department of Thoracic Surgery
> Institute for Medicine and Public Health
> Vanderbilt University Medical Center
> (ph) 615-343-6284
> (fax) 615 936-3007
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

--
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```