Rembert De Blander <rembert.deblander@uclouvain.be> If you want us to reproduce this result, you can find a handful of obs where it can be seen, then start your example with -input- e.g.: clear set type double input pid x 1 5 1 . 2 5 2 7 end replace x=_pi if mi(x) loc x "x" loc id "pid" egen double i`x' = mean(`x'), by(pid) generate double I`x' = 0 qui levelsof `id', local(idlst) foreach lvl of local idlst { qui summarize `x' if (`id' == `lvl'), meanonly qui replace I`x' = r(mean) if (`id' == `lvl') } su li On Thu, Sep 25, 2008 at 10:26 PM, Rembert De Blander <rembert.deblander@uclouvain.be> wrote: > The problem can be stated as follows: > > Consider the panel data setting where the command > > <<tsset pid time>> > > was issued. Under these circumstances, the command: > > <<by pid: egen double i`x' = mean(`x')>> > > should be exactly identical to: > > << > generate double I`x' = 0 > qui levelsof `id', local(idlst) > foreach lvl of local idlst { > qui summarize `x' if (`id' == `lvl'), meanonly > qui replace I`x' = r(mean) if (`id' == `lvl') > } >>> > > Now, the problem, as far as I experienced it, can appear when `x' is a > float variable. Worse, the discrepancy between both command sequences > seems to involve a "random" component, since it differs from run to run. > The latter sequence of commands always produces identical results, but the > 'egen' commands output varies. Of course these fluctuations are of the > order of machine precision. Nevertheless they are worrying, since they > constitute 'unexpected' and certainly undocumented behaviour, which can > lead to substantial differences, especially in iterated procedures. > > The problem does not occur for any `x', but I have a dataset & sequence of > commands that produce the described behaviour. > > Since I am not allowed to post attachments, please mail me for more info: * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

