Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Fabio Zona <fabio.zona@unibocconi.it> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: replacing with mean |

Date |
Thu, 2 Dec 2010 16:35:29 +0100 (CET) |

Hi my request for "replace missing" was related to a more complex procedure I am trying to define. I think I better describe the whole problem and ask you for a more general and maybe effective solution. Here is the problem. Say I have 200 industries. I have a list of companies, List"A", of say 5000 companies distributed among these industries. For each of these companies I have 20 variables. Companies Industry List"A" Var1 Var2 Var3 A 1 A 2 A 3 .. B .. I have a separate list, List"B" of 700 companies, that is a subsample of List"A". I need to associate to each company "i" of List"B" (my sample for regression) the median value of its industry EXCLUDING the focal company "i", and have to do this for all 20 variables! It's extremely challenging. How would you deal with this? F ----- Messaggio originale ----- Da: "Steven Samuels" <sjsamuels@gmail.com> A: statalist@hsphsun2.harvard.edu Inviato: Giovedì, 2 dicembre 2010 15:40:30 GMT +01:00 Amsterdam/Berlino/Berna/Roma/Stoccolma/Vienna Oggetto: Re: st: replacing with mean Fabio Zona: Why do you want to impute the missing data? For all the purposes that I can think of, mean replacement is an approach to avoid. While it reproduces means, it distorts most other properties of the observed and unknown complete data, including standard deviations, correlations, and regression estimates. Consider one of Stata's other imputation programs: -mi- in Stata 11; -mim- or -ice- from SSC. Steve Steven J. Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 On Dec 2, 2010, at 3:42 AM, Nick Cox wrote: No; it is not necessary as you could calculate the means in Mata. But Michael's suggestion will typically be easier to work with. -compress- usually gives extra memory painlessly. Nick n.j.cox@durham.ac.uk Fabio Zona ...one more thing.... is it necessary to generate a new variable of the mean? This consumes memory in stata.. Michael N. Mitchell Will this do the trick? egen missrev = mean(revenues), by(industry) replace revenues = missrev if missing(revenues) On 2010-12-01 10.31 PM, Fabio Zona wrote: > I have a set of industries, with a different number of firms in each > industry; per each firm I have a value, say it be Revenues > > Industry Firm Revenues > A 1 100 > A 2 150 > A 3 missing1 > A 4 120 > B 5 80 > B 6 130 > B 7 missing2 > .. > > I need to replace the missing value of Revenues with the mean of the > Revenues within the same industries (For example, missing1 for firm > 3, needs to be replaced with the mean of the values 100, 150, 120, > that is, with the mean of the revenues of other firms 1, 2 and 4 > which belong to the same industry to which firm 3 belongs). > I need to do this hundreds of time. > How can I do it easily? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: replacing with mean***From:*Nick Cox <n.j.cox@durham.ac.uk>

**Re: st: replacing with mean***From:*"Dimitriy V. Masterov" <dvmaster@gmail.com>

**References**:**Re: st: replacing with mean***From:*Steven Samuels <sjsamuels@gmail.com>

- Prev by Date:
**Re: st: .ado and .plugin** - Next by Date:
**RE: st: .ado and .plugin** - Previous by thread:
**Re: st: replacing with mean** - Next by thread:
**Re: st: replacing with mean** - Index(es):