Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: filling in missing panel data as a trend line


From   "Jason Yackee" <jyackee@law.usc.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: filling in missing panel data as a trend line
Date   Sat, 9 Sep 2006 10:45:54 -0700

Nick,

Thank you for the suggestion.  I don't think -ipolate- quite works for
what I have in mind, but maybe I am wrong.  Here is a hypothetical
picture of the data.  "Education" is simply the average total years of
education of a country's population.  

Country	Year	Education
Mex.		1970	3.4		
Mex.		1971	.
Mex.		1972	.
Mex.		1973	.
Mex.		1974	.	
Mex.		1975	4.2
Mex.		1976	.
Mex.		1977	.
Mex.		1978	.
Mex.		1979	.
Mex.		1980	4.7
Nic.		1970	1.5
Nic.		1971	.
Nic.		1972	.
~~~		~~~	~~~
Nic.		1980	3.2

Perhaps a better way of describing what I want to do is to fill in the
years between survey dates with a sort of moving average, so that the
differences between the measured years are evenly split between the
(in-between) missing years.  So for Mexico, the difference between
measured year 1975 and measured year 1970 is 4.2 - 3.4 = 0.8.  To
linearly fill in the missing values, I would make 1971 = [3.4 +
(0.8*1)/5], 1972 = [3.4 + (0.8*2)/5], and so on.

I could obviously do this by hand, but for 140 countries and 30 years
this would take some time.  So I take it that I would have to write some
code automate the process?  Since I am new to code-writing, any ideas
would be very much appreciated.

Jason Yackee
Stata 9.2 Intercooled

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Saturday, September 09, 2006 3:57 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: filling in missing panel data as a trend line

This sounds like linear interpolation: see -ipolate-. 
Panel data should be interpolated separately -by <panelid>:-. 

But note that if "education" means something like "years 
of education", then that case too is discussed in the 
FAQ you cite in its last section, at least for people 
who stay in the system and progress a year at a time. 
People who repeat a year or take years out of the system
are naturally a complication. 

Nick 
n.j.cox@durham.ac.uk 

Jason Yackee, PhD Candidate; J.D.

> For my panel data set I have a variable ("education") that 
> has only been
> collected every five years.  My data set is otherwise annual; I would
> like to fill in the missing data for "education" on the basis of a
> regression/trend line between each five-year observation, rather than
> using the "cascade" method detailed in this faq:
> http://www.stata.com/support/faqs/data/missing.html.  I don't 
> see a way
> to do what I want to do using -impute-.  Would someone be able to
> suggest an appropriate approach?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index