Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.
I would appreciate your guidance on the following topic regarding multiple
imputation (MI) and z-standardisation. I am currently learning MI using the
excellent stata help resources, but have an issue I can't find much support
I have a small dataset of 54 subjects, 4 of whom have missing data on a
variable which measures social capital in their neighbourhood, let's call
this variable "sc". It is a continuous variable with an approximate normal
distribution. I wish to use this variable in the substantive analysis
(eventually, a cox regression) as a predictor, using MI to estimate missing
values. The best way to include this in such an analysis is as a
z-standardised variable with a mean of 0 and sd of 1, to make parameter
estimates more interpretable.
I have followed the MI commands and can obtain MI estimates for sc. My
question is as follows:
I am unclear how/when/if to perform z-transformation on the multiply
imputed data. I have considered two options:
1. Prior to MI, generate "zsc" using the "egen zsc=std(sc)" command and
then run the appropriate MI commands, including "mi impute" on "zsc" to
obtain direct estimates of the missing zsc values under an MI scenario.
2. Estimate missing values of "sc" using "mi impute" and then transform the
variable after imputation using the command "mi passive: egen zsc=std(sc)".
(An aside, I am assuming here that this is the correct way to specify
"zsc", as it is a function of "sc"; your input would be welcome).
Either way, when I check the summary distribution of zsc for the Mth
imputation ("mi xeq 0 1 20: summ zsc"), I do not quite get back the zsc
variable with a mean of 0 & sd of 1, obviously, as the imputed values are
just that, though the summaries for each imputative are reasonably close to
this value (i.e. mean~-.03, sd~.99).
So my questions are really:
A. Can I still use the zsc variable in my substantive analysis and make the
assumption it still has a mean of 0 / sd of 1?
B. Is either method (1 vs 2) preferable?
C. Is there another, preferable, way of achieving z-standardisation
D. Should I be using z-standardisations at all with MI?
Many thanks in advance for your help with this matter.