[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Filling missing values. with averages

From	Russell Dimond <[email protected]>
To	[email protected]
Subject	Re: st: Filling missing values. with averages
Date	Tue, 13 May 2003 10:59:19 -0500

Radu Ban's suggestion will work well if you know that the first and last year for each id is never missing. But suppose your data included:

id year rates
1 1965 .5
2 1945 missing
2 1946 .2

Presumably you don't want the missing rate to be set to .35.

This is easily fixed though:

by id: replace rates = (rates[_n-1]+rates[_n+1])/2 if rates == .

The by splits your data set into separate parts for each id. Thus rates[_n-1] is missing for the first year of each id rather than reaching back to the last year of the previous id. In our example, the missing value will remain missing (which is better than .35). If your goal is to eliminate all missing values you'll still have to come up with an alternative way of imputing missing first and last years. The obvious thing, and on the same order of accuracy as what you're doing, would be to set it equal to the second/second to last year. So after running the above enough times to fix all internal gaps, try:

by id: replace rates=rates[2] if _n==1
by id: replace rates=rates[_N-1] if _n==_N

Of course there are all sorts of more complex and hopefully more accurate things you could do instead.

Russell Dimond
Research Support
Social Science Computing Cooperative
University of Wisconsin

At 06:44 PM 5/12/2003 -0400, you wrote:

one not so clever way to do would be to replace the missing obs with the
average of the values before and after.


replace rates = (rates[_n-1]+rates[_n+1])/2 if rates == .

then repeat to take care of consecutive missing values

replace rates = (rates[_n-1]+rates[_n+2])/2 if rates == .



i think you get my point, and you can repeat this by changing the values
to be averaged, until you fill in the entire variable.

i hope this helps,

radu ban

On Mon, 12 May 2003, Moon Joong wrote:

> Dear Stata Lovers,
>
> I'm trying to find a smart way to fill missing values with averages between
> nearest values. The data looks like the following:
>
> id   year   rates
> 1   1960  .35
> 1   1961   missing
> 1   1962  .45
> 1   1963  .47
> 1   1964   missing
> 1   1965  .46
> 2   1945  .2
> 2   1946   missing
> 2   1947  .35
> 2   1948   missing
> 3   1949   missing
> 3   1950   .56
> .....
>
> What I want to do is, take the example of id 1 of year 1961 (which is
> missing), to fill the average value between .35 and .45 (that is
> (.35+.45)/2) into the missinge observation of year 1961. Since I have a lot
> of observations to fill in, using brute force by hands takes forever and
> would be inaccurate. I think it's very straightfoward, which I don't know
> since I'm a beginner.
>
> Thanks in advance.
>
> MoonJoong
>
> _________________________________________________________________
> Add photos to your e-mail with MSN 8. Get 2 months FREE*.
> http://join.msn.com/?page=features/featuredemail
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Filling missing values. with averages
  - From: "Moon Joong" <[email protected]>
- Re: st: Filling missing values. with averages
  - From: Radu Ban <[email protected]>

Prev by Date: st: R-square in Nested Logit
Next by Date: st: Possible trouble with -xi-
Previous by thread: Re: st: Filling missing values. with averages
Next by thread: st: Re: Filling missing values. with averages
Index(es):
- Date
- Thread