Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: MI and z-standardisation

From   "J.B. Kirkbride" <>
Subject   st: MI and z-standardisation
Date   14 Apr 2011 12:53:38 +0100

Dear Stata Users

I would appreciate your guidance on the following topic regarding multiple imputation (MI) and z-standardisation. I am currently learning MI using the excellent stata help resources, but have an issue I can't find much support for.

I have a small dataset of 54 subjects, 4 of whom have missing data on a variable which measures social capital in their neighbourhood, let's call this variable "sc". It is a continuous variable with an approximate normal distribution. I wish to use this variable in the substantive analysis (eventually, a cox regression) as a predictor, using MI to estimate missing values. The best way to include this in such an analysis is as a z-standardised variable with a mean of 0 and sd of 1, to make parameter estimates more interpretable.

I have followed the MI commands and can obtain MI estimates for sc. My question is as follows:

I am unclear how/when/if to perform z-transformation on the multiply imputed data. I have considered two options:

1. Prior to MI, generate "zsc" using the "egen zsc=std(sc)" command and then run the appropriate MI commands, including "mi impute" on "zsc" to obtain direct estimates of the missing zsc values under an MI scenario.

2. Estimate missing values of "sc" using "mi impute" and then transform the variable after imputation using the command "mi passive: egen zsc=std(sc)". (An aside, I am assuming here that this is the correct way to specify "zsc", as it is a function of "sc"; your input would be welcome).

Either way, when I check the summary distribution of zsc for the Mth imputation ("mi xeq 0 1 20: summ zsc"), I do not quite get back the zsc variable with a mean of 0 & sd of 1, obviously, as the imputed values are just that, though the summaries for each imputative are reasonably close to this value (i.e. mean~-.03, sd~.99).

So my questions are really:

A. Can I still use the zsc variable in my substantive analysis and make the assumption it still has a mean of 0 / sd of 1?

B. Is either method (1 vs 2) preferable?

C. Is there another, preferable, way of achieving z-standardisation before/after MI?

D. Should I be using z-standardisations at all with MI? Many thanks in advance for your help with this matter. Best wishes James *
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index