[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Russell Dimond <[email protected]> |

To |
[email protected] |

Subject |
Re: st: Filling missing values. with averages |

Date |
Tue, 13 May 2003 10:59:19 -0500 |

Radu Ban's suggestion will work well if you know that the first and last year for each id is never missing. But suppose your data included:

id year rates

1 1965 .5

2 1945 missing

2 1946 .2

Presumably you don't want the missing rate to be set to .35.

This is easily fixed though:

by id: replace rates = (rates[_n-1]+rates[_n+1])/2 if rates == .

The by splits your data set into separate parts for each id. Thus rates[_n-1] is missing for the first year of each id rather than reaching back to the last year of the previous id. In our example, the missing value will remain missing (which is better than .35). If your goal is to eliminate all missing values you'll still have to come up with an alternative way of imputing missing first and last years. The obvious thing, and on the same order of accuracy as what you're doing, would be to set it equal to the second/second to last year. So after running the above enough times to fix all internal gaps, try:

by id: replace rates=rates[2] if _n==1

by id: replace rates=rates[_N-1] if _n==_N

Of course there are all sorts of more complex and hopefully more accurate things you could do instead.

Russell Dimond

Research Support

Social Science Computing Cooperative

University of Wisconsin

At 06:44 PM 5/12/2003 -0400, you wrote:

one not so clever way to do would be to replace the missing obs with the average of the values before and after. replace rates = (rates[_n-1]+rates[_n+1])/2 if rates == . then repeat to take care of consecutive missing values replace rates = (rates[_n-1]+rates[_n+2])/2 if rates == . i think you get my point, and you can repeat this by changing the values to be averaged, until you fill in the entire variable. i hope this helps, radu ban On Mon, 12 May 2003, Moon Joong wrote: > Dear Stata Lovers, > > I'm trying to find a smart way to fill missing values with averages between > nearest values. The data looks like the following: > > id year rates > 1 1960 .35 > 1 1961 missing > 1 1962 .45 > 1 1963 .47 > 1 1964 missing > 1 1965 .46 > 2 1945 .2 > 2 1946 missing > 2 1947 .35 > 2 1948 missing > 3 1949 missing > 3 1950 .56 > ..... > > What I want to do is, take the example of id 1 of year 1961 (which is > missing), to fill the average value between .35 and .45 (that is > (.35+.45)/2) into the missinge observation of year 1961. Since I have a lot > of observations to fill in, using brute force by hands takes forever and > would be inaccurate. I think it's very straightfoward, which I don't know > since I'm a beginner. > > Thanks in advance. > > MoonJoong > > _________________________________________________________________ > Add photos to your e-mail with MSN 8. Get 2 months FREE*. > http://join.msn.com/?page=features/featuredemail > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Filling missing values. with averages***From:*"Moon Joong" <[email protected]>

**Re: st: Filling missing values. with averages***From:*Radu Ban <[email protected]>

- Prev by Date:
**st: R-square in Nested Logit** - Next by Date:
**st: Possible trouble with -xi-** - Previous by thread:
**Re: st: Filling missing values. with averages** - Next by thread:
**st: Re: Filling missing values. with averages** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |