[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Filling missing values. with averages
Radu Ban's suggestion will work well if you know that the first and last
year for each id is never missing. But suppose your data included:
id year rates
1 1965 .5
2 1945 missing
2 1946 .2
Presumably you don't want the missing rate to be set to .35.
This is easily fixed though:
by id: replace rates = (rates[_n-1]+rates[_n+1])/2 if rates == .
The by splits your data set into separate parts for each id. Thus
rates[_n-1] is missing for the first year of each id rather than reaching
back to the last year of the previous id. In our example, the missing
value will remain missing (which is better than .35). If your goal is to
eliminate all missing values you'll still have to come up with an
alternative way of imputing missing first and last years. The obvious
thing, and on the same order of accuracy as what you're doing, would be to
set it equal to the second/second to last year. So after running the above
enough times to fix all internal gaps, try:
by id: replace rates=rates if _n==1
by id: replace rates=rates[_N-1] if _n==_N
Of course there are all sorts of more complex and hopefully more accurate
things you could do instead.
Social Science Computing Cooperative
University of Wisconsin
At 06:44 PM 5/12/2003 -0400, you wrote:
one not so clever way to do would be to replace the missing obs with the
average of the values before and after.
replace rates = (rates[_n-1]+rates[_n+1])/2 if rates == .
then repeat to take care of consecutive missing values
replace rates = (rates[_n-1]+rates[_n+2])/2 if rates == .
i think you get my point, and you can repeat this by changing the values
to be averaged, until you fill in the entire variable.
i hope this helps,
On Mon, 12 May 2003, Moon Joong wrote:
> Dear Stata Lovers,
> I'm trying to find a smart way to fill missing values with averages between
> nearest values. The data looks like the following:
> id year rates
> 1 1960 .35
> 1 1961 missing
> 1 1962 .45
> 1 1963 .47
> 1 1964 missing
> 1 1965 .46
> 2 1945 .2
> 2 1946 missing
> 2 1947 .35
> 2 1948 missing
> 3 1949 missing
> 3 1950 .56
> What I want to do is, take the example of id 1 of year 1961 (which is
> missing), to fill the average value between .35 and .45 (that is
> (.35+.45)/2) into the missinge observation of year 1961. Since I have a lot
> of observations to fill in, using brute force by hands takes forever and
> would be inaccurate. I think it's very straightfoward, which I don't know
> since I'm a beginner.
> Thanks in advance.
> Add photos to your e-mail with MSN 8. Get 2 months FREE*.
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try:
* For searches and help try: