Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Is there a kind of stochasticity in the execution of xthtaylor?


From   "David M. Drukker, StataCorp" <ddrukker@stata.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Is there a kind of stochasticity in the execution of xthtaylor?
Date   Thu, 17 Jul 2008 10:11:33 -0500 (CDT)

Hewan Belay <hewan_belay@yahoo.com> asked why -xthtaylor- would drop a
different variable when -xthtaylor- is run at different times. He also
speculated that this was due to a problem with -xthtaylor-.

There is no problem with the results produced by -xthtaylor-, which variable
is dropped is arbitrary.

Changes in the sort order of the data are responsible for this difference.
But, it is still surprising when the variables dropped change from run to
run.

Computers compute in finite precision. Among other things, finite
precision mathematics means that changing the order in which a group of
numbers are summed can cause minor differences in the computed sum. These
minor changes in the computed sum can alter which variable is dropped.

Dropping one of a series of perfectly collinear variables is a classic knife-edge computation. In -xthtaylor- the decision is especially complicated because it is made on the basis of transformed variables, not the original variables. In addition, these transforms are computed after sorting the data by the panel-id variable, which is not unique. I suspect that minor differences in the computed transforms are triggering a difference in the knife-edge decision of which variable is dropped.

We will change -xthtaylor- so that the sorts on the panel-id variable depend
deterministically on the original sort order of the data. This will remove
the variation from run to run, unless some other command re-sorts the data
in between runs.

If Hewan could privately send me the data and a do-file that reproduces the posted example, then I can ensure that the fix addresses the problem at hand.

In the meantime, Hewan should simply exclude one of the time dummies to
ensure that the same variables are used across runs and samples.

David
<ddrukker@stata.com>

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index