[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Filling data using Ipolate when data used are constrained

From   Maarten buis <>
Subject   Re: st: Filling data using Ipolate when data used are constrained
Date   Mon, 31 Dec 2007 13:15:06 +0000 (GMT)

--- "Pavlos C. Symeou" <> wrote:
> I have data for a number of countries over a series of years. I want
> to  fill the missing values of my variables using the -ipolate-
> method; yet,  I want the -ipolate- command to be applied separately
> for each country  in a manner that the data involved in the
> estimation are bounded at the  bottom by the earliest value of each
> variable. An illustration of my  data follows.

Lets try to do this for the variable mobile_users. First use -ipolate-
separately for each country (this is done with the -by()- option in
-ipolate-) and starting from the first non-missing observation. This is
done by first creating the variable notmis, which 1 if mobile_users is
observed and 0 if mobile_users is missing. Than sum_notmis contains the
running sum of this variable, so is 0 the current and all preceding
observations are missing, 1 if it is the first observed observation,
etc. Than we want to use -ipolate- on only those observations whose
value on sum_notmis > 0:

sort country year
gen notmis = !missing(mobile_users)
by country: gen sum_notmis = sum(notmis)
ipolate mobile_users year if sum_notmis > 0, by(country)

Next we want to replace the remaining missing values with the value of
the first observed observation. The first observed observation is
marked with the value 1 on sum_notmis. the variable tot_first contains
a constant (within country) with that first value.:

gen first_notmis = mobile_users if sum_notmis == 1
by country: egen tot_first = total(first_notmis)
replace mobile_users = tot_first if mobile_users >=.
Now that I told how you can do that, I am going to tell you that you
shouldn't. The reason for that is the same as the reason why you should
not use -impute-, which I explained in the second part of this post: . The
alternative suggested in that post, -ice-, is also the alternative I
recommend here, though probably you want to do multiple imputation here
(instead of single imputation as I recommended in the post I refered
to). If you want to ensure that the counts remain positive, you can
model the log of the counts (implying exponential growth, which seems
plausible to me)

Anyhow, you might want to do some reading before getting into this,

Royston, P. 2004. Multiple imputation of missing values. Stata Journal
4(3): 227241.

Royston, P. 2005. Multiple imputation of missing values: update. Stata
Journal 5(2): 188201.

Royston, P. 2005. Multiple imputation of missing values: Update of ice.
Stata Journal 5(4): 527-536.

Hope this helps and happy new year,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

Sent from Yahoo! Mail - a smarter inbox

*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index