# Re: st: replacing with mean

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: replacing with mean Date Thu, 2 Dec 2010 09:40:30 -0500

Fabio Zona: Why do you want to impute the missing data? For all the purposes that I can think of, mean replacement is an approach to avoid. While it reproduces means, it distorts most other properties of the observed and unknown complete data, including standard deviations, correlations, and regression estimates. Consider one of Stata's other imputation programs: -mi- in Stata 11; -mim- or -ice- from SSC.
Steve

On Dec 2, 2010, at 3:42 AM, Nick Cox wrote:

No; it is not necessary as you could calculate the means in Mata. But Michael's suggestion will typically be easier to work with.
-compress- usually gives extra memory painlessly.

Nick
n.j.cox@durham.ac.uk

Fabio Zona

...one more thing.... is it necessary to generate a new variable of the mean? This consumes memory in stata..
Michael N. Mitchell

Will this do the trick?

egen missrev = mean(revenues), by(industry)
replace revenues = missrev if missing(revenues)

On 2010-12-01 10.31 PM, Fabio Zona wrote:

I have a set of industries, with a different number of firms in each industry; per each firm I have a value, say it be Revenues
Industry         Firm         Revenues
A                  1            100
A                  2            150
A                  3          missing1
A                  4            120
B                  5             80
B                  6            130
B                  7          missing2
..

I need to replace the missing value of Revenues with the mean of the Revenues within the same industries (For example, missing1 for firm 3, needs to be replaced with the mean of the values 100, 150, 120, that is, with the mean of the revenues of other firms 1, 2 and 4 which belong to the same industry to which firm 3 belongs).
```I need to do this hundreds of time.
How can I do it easily?
