Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: replacing with mean

 From Fabio Zona To statalist@hsphsun2.harvard.edu Subject Re: st: replacing with mean Date Thu, 2 Dec 2010 16:35:29 +0100 (CET)

```Hi

my request for "replace missing" was related to a more complex procedure I am trying to define. I think I better describe the whole problem and ask you for a more general and maybe effective solution.

Here is the problem.

Say I have 200 industries. I have a list of companies, List"A", of say 5000 companies distributed among these industries. For each of these companies I have 20 variables.

Companies
Industry   List"A"    Var1   Var2  Var3
A             1
A             2
A             3
..
B
..

I have a separate list, List"B" of 700 companies, that is a subsample of List"A".

I need to associate to each company "i" of List"B" (my sample for regression) the median value of its industry EXCLUDING the focal company "i", and have to do this for all 20 variables!

It's extremely challenging. How would you deal with this?

F

----- Messaggio originale -----
Da: "Steven Samuels" <sjsamuels@gmail.com>
A: statalist@hsphsun2.harvard.edu
Inviato: Giovedì, 2 dicembre 2010 15:40:30 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna
Oggetto: Re: st: replacing with mean

Fabio Zona: Why do you want to impute the missing data?  For all the
purposes that I can think of, mean replacement is an approach to
avoid.  While it reproduces means, it distorts most other properties
of the observed and unknown complete data, including standard
deviations, correlations, and regression estimates.  Consider one of
Stata's other imputation programs: -mi- in Stata 11;  -mim- or -ice-
from SSC.

Steve

Steven J. Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

On Dec 2, 2010, at 3:42 AM, Nick Cox wrote:

No; it is not necessary as you could calculate the means in Mata. But
Michael's suggestion will typically be easier to work with.

-compress- usually gives extra memory painlessly.

Nick
n.j.cox@durham.ac.uk

Fabio Zona

...one more thing.... is it necessary to generate a new variable of
the mean? This consumes memory in stata..

Michael N. Mitchell

Will this do the trick?

egen missrev = mean(revenues), by(industry)
replace revenues = missrev if missing(revenues)

On 2010-12-01 10.31 PM, Fabio Zona wrote:

> I have a set of industries, with a different number of firms in each
> industry; per each firm I have a value, say it be Revenues
>
> Industry         Firm         Revenues
> A                  1            100
> A                  2            150
> A                  3          missing1
> A                  4            120
> B                  5             80
> B                  6            130
> B                  7          missing2
> ..
>
> I need to replace the missing value of Revenues with the mean of the
> Revenues within the same industries (For example, missing1 for firm
> 3, needs to be replaced with the mean of the values 100, 150, 120,
> that is, with the mean of the revenues of other firms 1, 2 and 4
> which belong to the same industry to which firm 3 belongs).
> I need to do this hundreds of time.
> How can I do it easily?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```