Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: replacing with mean

From   Fabio Zona <>
Subject   Re: st: replacing with mean
Date   Thu, 2 Dec 2010 16:35:29 +0100 (CET)


my request for "replace missing" was related to a more complex procedure I am trying to define. I think I better describe the whole problem and ask you for a more general and maybe effective solution.

Here is the problem.

Say I have 200 industries. I have a list of companies, List"A", of say 5000 companies distributed among these industries. For each of these companies I have 20 variables. 

Industry   List"A"    Var1   Var2  Var3   
A             1        
A             2
A             3

I have a separate list, List"B" of 700 companies, that is a subsample of List"A". 

I need to associate to each company "i" of List"B" (my sample for regression) the median value of its industry EXCLUDING the focal company "i", and have to do this for all 20 variables!

It's extremely challenging. How would you deal with this?


----- Messaggio originale -----
Da: "Steven Samuels" <>
Inviato: Giovedì, 2 dicembre 2010 15:40:30 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna
Oggetto: Re: st: replacing with mean

Fabio Zona: Why do you want to impute the missing data?  For all the  
purposes that I can think of, mean replacement is an approach to  
avoid.  While it reproduces means, it distorts most other properties  
of the observed and unknown complete data, including standard  
deviations, correlations, and regression estimates.  Consider one of  
Stata's other imputation programs: -mi- in Stata 11;  -mim- or -ice-  
from SSC.


Steven J. Samuels
18 Cantine's Island
Saugerties NY 12477
Voice: 845-246-0774
Fax:    206-202-4783

On Dec 2, 2010, at 3:42 AM, Nick Cox wrote:

No; it is not necessary as you could calculate the means in Mata. But  
Michael's suggestion will typically be easier to work with.

-compress- usually gives extra memory painlessly.


Fabio Zona more thing.... is it necessary to generate a new variable of  
the mean? This consumes memory in stata..

Michael N. Mitchell

   Will this do the trick?

egen missrev = mean(revenues), by(industry)
replace revenues = missrev if missing(revenues)

On 2010-12-01 10.31 PM, Fabio Zona wrote:

> I have a set of industries, with a different number of firms in each  
> industry; per each firm I have a value, say it be Revenues
> Industry         Firm         Revenues
> A                  1            100
> A                  2            150
> A                  3          missing1
> A                  4            120
> B                  5             80
> B                  6            130
> B                  7          missing2
> ..
> I need to replace the missing value of Revenues with the mean of the  
> Revenues within the same industries (For example, missing1 for firm  
> 3, needs to be replaced with the mean of the values 100, 150, 120,  
> that is, with the mean of the revenues of other firms 1, 2 and 4  
> which belong to the same industry to which firm 3 belongs).
> I need to do this hundreds of time.
> How can I do it easily?

*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index