Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: generating splines in variable with missing data and multiple imputation

From   Maarten Buis <>
Subject   Re: st: generating splines in variable with missing data and multiple imputation
Date   Tue, 26 Feb 2013 09:41:27 +0100

The problem is not the standard error, but your effect: you have
imputed the values for pkyr assuming a linear effect, so the effect
you will get out of your final model will be biased towards a linear
effect. Passive imputation is technique that sounds plausible but is
controversial. The alternative that is often proposed is to create
your splines (and interactions, and polynomials, and ...) before
imputing the data and treat them as just another variable to be
imputed. Some say that this actually performs better than passive
imputation (Graham 2009, von Hippel 2009), and has the additional
advantage of being easy to implement.

Hope this helps,

John Graham (2009) "Missing Data Analysis: Making it Work in the Real
World", Annual Review of Psychology, 60:549-576.

Paul von Hippel (2009) "How to impute interactions, squares, and other
transformed variables", Sociological Methodology, 39:265-291.

On Mon, Feb 25, 2013 at 9:52 PM, Deppen, Steve
<> wrote:
> I'm using Stata v12 and I have a small (492) dataset with missing data.  One of the variables, pack-years has a non-linear relationship to the outcome of cancer.  Pack-years is best modeled, given my limited degrees of freedom for other variables of interest, as a restricted cubic spline with 3 knots.  I'm missing data within pack years.  I can run:
> mkspline pkyr = pack_years, cubic nknots(3)
> after I generate my 20 imputed datasets.  However, I believe that my confidence interval may be incorrect.  I know in R, that variance inflation due to imputing the nonlinear variable is maintained using aregImpute and subsequent fit.mult.impute.  I afraid my standard errors are too small since I estimated the splines outside the imputation.  Is there a way to generate splines as a passive variable within the multiple imputation?
> Thank you,
> Stephen Deppen MA MS
> Department of Thoracic Surgery
> Institute for Medicine and Public Health
> Vanderbilt University Medical Center
> (ph) 615-343-6284
> (fax) 615 936-3007
> *
> *   For searches and help try:
> *
> *
> *

Maarten L. Buis
Reichpietschufer 50
10785 Berlin

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index