[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: Multiple Imputation

From   Maarten buis <>
Subject   Re: st: Re: Multiple Imputation
Date   Mon, 15 Dec 2008 23:16:28 +0000 (GMT)

> The problem that I now have is that  I am using  a
> user written program called levpet that generates values of a
> variable  (Total Factor Productivity) for a firm using a non-linear
> algorithm. This variable is created using data on employment, net
> value added and capital. I have  large number of missing values on
> employment.  If I generate values of the missing observations for
> employment  in my original data set using ice, I get n number of
> imputed data files (where n is the no. of imputations). When I load
> them into memory, I cannot get levpet to work over all data sets (at
> least I do not how to get levpet to generate imputed values of tfp
> over several data sets) to generate the TFP measure. Therefore, I am
> using values of employment imputed from salary and wage data.  Given
> the limitation that I face, what steps can I take to ensure that
> impute does a reasonable job.

You should not use -impute-, and you don't need to. In all likelihood
you can just use -mim- with the -cat(fit)- option. To install -mim-
type -ssc install mim-.

If -mim- doesn't work then your first conclusion should be that you
typed something wrong, and should try harder to make -mim- work. If it
really is not possible then you can do this manually, as -levpet- 
allows you to use -if-, so you can estimate the parameter of interest
in each imputed sample by selecting on the variable _mj: the first
sample is _mj==1, the second _mj == 2, etc. After that you can combine
the results using the equations discussed here:

The results of -levpet- seem to be stored in e(b) and e(V) just like
all other regular Stata estimation commands, so the example below using
-regress- can straightforwardly generalized to -levpet-.

*------------------------- begin example -----------------------------
sysuse nlsw88, clear
replace wage = . if uniform() < invlogit(5 - .5*grade)

ice wage grade age union, clear m(5)

reg wage grade age union if _mj == 1
matrix b = e(b)'
matrix v = e(V)
matrix V = vecdiag(v)'
reg wage grade age union if _mj == 2
matrix b = b, e(b)'
matrix v = e(V)
matrix V = V, vecdiag(v)'
reg wage grade age union if _mj == 3
matrix b = b, e(b)'
matrix v = e(V)
matrix V = V, vecdiag(v)'
reg wage grade age union if _mj == 4
matrix b = b, e(b)'
matrix v = e(V)
matrix V = V, vecdiag(v)'
reg wage grade age union if _mj == 5
matrix b = b, e(b)'
matrix v = e(V)
matrix V = V, vecdiag(v)'

b = st_matrix("b")'
V = st_matrix("V")'
Qbar = mean(b)'
Ubar = mean(V)'
B = diagonal(variance(b))
T = Ubar :+ 1.2:*B
se = sqrt(T)
df= 4:* (1 :+ (5:*Ubar):/(6:*B)) :* (1 :+ (5:*Ubar):/(6:*B))
t = Qbar:/se
p = 2*ttail(df, abs(t))
ci = Qbar :- invttail(df,0.025):*se, Qbar :+ invttail(df,0.025):*se
result = Qbar, sd, t, df, p, ci
st_matrix("result", result)

matrix rownames result = grade age union _cons
matrix colnames result = coef std_err t df p lb ub
matrix list result
*--------------------------- end example --------------------------
(For more on how to use examples I sent to the Statalist, see )

>  Would reporting the correlations between salaries and employment for
> non-missing observations help? Is there any method for setting a
> bound on the prediction error that is unaccounted in the impute
> command? 

You can simulate (my solution to almost everything), but that is a
waste of time, as you can and should use -ice- instead.

Hope this helps,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index