[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Filling data using Ipolate when data used are constrained

From   "Pavlos C. Symeou" <[email protected]>
To   [email protected]
Subject   Re: st: Filling data using Ipolate when data used are constrained
Date   01 Jan 2008 13:31:31 +0000

Dear Maarten,

yes it definitely helps. Wishes for a Happy New Year. 



On Dec 31 2007, Maarten buis wrote:

>--- "Pavlos C. Symeou" <[email protected]> wrote:
>> I have data for a number of countries over a series of years. I want
>> to  fill the missing values of my variables using the -ipolate-
>> method; yet,  I want the -ipolate- command to be applied separately
>> for each country  in a manner that the data involved in the
>> estimation are bounded at the  bottom by the earliest value of each
>> variable. An illustration of my  data follows.
>Lets try to do this for the variable mobile_users. First use -ipolate-
>separately for each country (this is done with the -by()- option in
>-ipolate-) and starting from the first non-missing observation. This is
>done by first creating the variable notmis, which 1 if mobile_users is
>observed and 0 if mobile_users is missing. Than sum_notmis contains the
>running sum of this variable, so is 0 the current and all preceding
>observations are missing, 1 if it is the first observed observation,
>etc. Than we want to use -ipolate- on only those observations whose
>value on sum_notmis > 0:
>sort country year
>gen notmis = !missing(mobile_users)
>by country: gen sum_notmis = sum(notmis)
>ipolate mobile_users year if sum_notmis > 0, by(country)
>Next we want to replace the remaining missing values with the value of
>the first observed observation. The first observed observation is
>marked with the value 1 on sum_notmis. the variable tot_first contains
>a constant (within country) with that first value.:
>gen first_notmis = mobile_users if sum_notmis == 1
>by country: egen tot_first = total(first_notmis)
>replace mobile_users = tot_first if mobile_users >=.
>Now that I told how you can do that, I am going to tell you that you
>shouldn't. The reason for that is the same as the reason why you should
>not use -impute-, which I explained in the second part of this post:
> . The
>alternative suggested in that post, -ice-, is also the alternative I
>recommend here, though probably you want to do multiple imputation here
>(instead of single imputation as I recommended in the post I refered
>to). If you want to ensure that the counts remain positive, you can
>model the log of the counts (implying exponential growth, which seems
>plausible to me)
>Anyhow, you might want to do some reading before getting into this,
>Royston, P. 2004. Multiple imputation of missing values. Stata Journal
>4(3): 227-241.
>Royston, P. 2005. Multiple imputation of missing values: update. Stata
>Journal 5(2): 188-201.
>Royston, P. 2005. Multiple imputation of missing values: Update of ice.
>Stata Journal 5(4): 527-536. 
>Hope this helps and happy new year,
>Maarten L. Buis
>Department of Social Research Methodology
>Vrije Universiteit Amsterdam
>Boelelaan 1081
>1081 HV Amsterdam
>The Netherlands
>visiting address:
>Buitenveldertselaan 3 (Metropolitan), room Z434
>+31 20 5986715
>      __________________________________________________________
>Sent from Yahoo! Mail - a smarter inbox
>*   For searches and help try:

Pavlos C. Symeou
PhD Candidate in Management Studies
Judge Business School
University of Cambridge

Clare Hall College
Herschel Road, Cambridge
Tel: +447920045575
E-mail: [email protected]

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index