# st: RE: Re: Filling missing values. with averages

 From "Nick Cox" <[email protected]> To <[email protected]> Subject st: RE: Re: Filling missing values. with averages Date Tue, 13 May 2003 09:24:22 +0100

```Moon Joong
> >
> > I'm trying to find a smart way to fill missing values
> with averages
> between
> > nearest values. The data looks like the following:
> >
> > id   year   rates
> > 1   1960  .35
> > 1   1961   missing
> > 1   1962  .45
> > 1   1963  .47
> > 1   1964   missing
> > 1   1965  .46
> > 2   1945  .2
> > 2   1946   missing
> > 2   1947  .35
> > 2   1948   missing
> > 3   1949   missing
> > 3   1950   .56
> > .....
> >
> > What I want to do is, take the example of id 1 of year
> 1961 (which is
> > missing), to fill the average value between .35 and .45 (that is
> > (.35+.45)/2) into the missinge observation of year 1961.
> Since I have a
> lot
> > of observations to fill in, using brute force by hands
> takes forever and
> > would be inaccurate. I think it's very straightfoward,
> which I don't know
> > since I'm a beginner.

In addition to other suggestions, you could use -ipolate-.
However, you should only do this separately for each -id-.
Nevertheless you can cycle over the -id-s with -foreach-.

The following is a sketch. Also check out the -epolate-
option.

levels id, local(levels)
gen rates2 = rates

qui foreach l of local levels {
ipolate rates year if id == `l', gen(work)
replace rates2 = work if missing(rates2) & !missing(work)
drop work
}

This will fill gaps of more than one.

That's linear interpolation. Other interpolation
methods might also be tried. But linear is

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```