Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: replacing with mean


From   Steven Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: replacing with mean
Date   Thu, 2 Dec 2010 09:40:30 -0500

Fabio Zona: Why do you want to impute the missing data? For all the purposes that I can think of, mean replacement is an approach to avoid. While it reproduces means, it distorts most other properties of the observed and unknown complete data, including standard deviations, correlations, and regression estimates. Consider one of Stata's other imputation programs: -mi- in Stata 11; -mim- or -ice- from SSC.

Steve

Steven J. Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783


On Dec 2, 2010, at 3:42 AM, Nick Cox wrote:

No; it is not necessary as you could calculate the means in Mata. But Michael's suggestion will typically be easier to work with.

-compress- usually gives extra memory painlessly.

Nick
n.j.cox@durham.ac.uk

Fabio Zona

...one more thing.... is it necessary to generate a new variable of the mean? This consumes memory in stata..

Michael N. Mitchell

  Will this do the trick?

egen missrev = mean(revenues), by(industry)
replace revenues = missrev if missing(revenues)

On 2010-12-01 10.31 PM, Fabio Zona wrote:

I have a set of industries, with a different number of firms in each industry; per each firm I have a value, say it be Revenues

Industry         Firm         Revenues
A                  1            100
A                  2            150
A                  3          missing1
A                  4            120
B                  5             80
B                  6            130
B                  7          missing2
..

I need to replace the missing value of Revenues with the mean of the Revenues within the same industries (For example, missing1 for firm 3, needs to be replaced with the mean of the values 100, 150, 120, that is, with the mean of the revenues of other firms 1, 2 and 4 which belong to the same industry to which firm 3 belongs).
I need to do this hundreds of time.
How can I do it easily?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index