# st: RE: RE: filling in missing panel data as a trend line

 From "Jason Yackee" To Subject st: RE: RE: filling in missing panel data as a trend line Date Sat, 9 Sep 2006 10:45:54 -0700

```Nick,

Thank you for the suggestion.  I don't think -ipolate- quite works for
what I have in mind, but maybe I am wrong.  Here is a hypothetical
picture of the data.  "Education" is simply the average total years of
education of a country's population.

Country	Year	Education
Mex.		1970	3.4
Mex.		1971	.
Mex.		1972	.
Mex.		1973	.
Mex.		1974	.
Mex.		1975	4.2
Mex.		1976	.
Mex.		1977	.
Mex.		1978	.
Mex.		1979	.
Mex.		1980	4.7
Nic.		1970	1.5
Nic.		1971	.
Nic.		1972	.
~~~		~~~	~~~
Nic.		1980	3.2

Perhaps a better way of describing what I want to do is to fill in the
years between survey dates with a sort of moving average, so that the
differences between the measured years are evenly split between the
(in-between) missing years.  So for Mexico, the difference between
measured year 1975 and measured year 1970 is 4.2 - 3.4 = 0.8.  To
linearly fill in the missing values, I would make 1971 = [3.4 +
(0.8*1)/5], 1972 = [3.4 + (0.8*2)/5], and so on.

I could obviously do this by hand, but for 140 countries and 30 years
this would take some time.  So I take it that I would have to write some
code automate the process?  Since I am new to code-writing, any ideas
would be very much appreciated.

Jason Yackee
Stata 9.2 Intercooled

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Saturday, September 09, 2006 3:57 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: filling in missing panel data as a trend line

This sounds like linear interpolation: see -ipolate-.
Panel data should be interpolated separately -by <panelid>:-.

But note that if "education" means something like "years
of education", then that case too is discussed in the
FAQ you cite in its last section, at least for people
who stay in the system and progress a year at a time.
People who repeat a year or take years out of the system
are naturally a complication.

Nick
n.j.cox@durham.ac.uk

Jason Yackee, PhD Candidate; J.D.

> For my panel data set I have a variable ("education") that
> has only been
> collected every five years.  My data set is otherwise annual; I would
> like to fill in the missing data for "education" on the basis of a
> regression/trend line between each five-year observation, rather than
> using the "cascade" method detailed in this faq:
> http://www.stata.com/support/faqs/data/missing.html.  I don't
> see a way
> to do what I want to do using -impute-.  Would someone be able to
> suggest an appropriate approach?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```