Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: xtlogit: panel data transformation's recast to double makes model incomputable


From   "JVerkuilen (Gmail)" <jvverkuilen@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: xtlogit: panel data transformation's recast to double makes model incomputable
Date   Tue, 2 Apr 2013 15:30:46 -0400

On Tue, Apr 2, 2013 at 3:14 PM, Tom <tommedema@gmail.com> wrote:
> Hi Jay,
>
> Per request these are the results of the "offending IVs" alone:
>

<snip>

So it seems like it works on its own.


>
> This is real price data, I also verified it several times. There
> appear to be no mistakes in the data.

Yeah, I'm sure they're real, that's not the issue, it's the fact that
the variable is so crazily skewed. That's what price data are like, of
course.


>
> Do you have an explanation why close_g100 would fail whereas close_g30
> does not? If you look at the summary statistics you'll see that the
> close_g30 variable and close_g5 etc. are actually much more skewed and
> have higher variations.
>

The best guess I have is that that variable creates a near-perfect
prediction. Collinearity in and of itself is a problem but the big
issue with logistic regression is usually perfect prediction. If you
have that, the whole thing blows up and that's something that Stata
detects automatically. What it will have a much harder time with is
finding near-perfect prediction, where you are *close* but not quite
at perfect prediction. The numerics will start to break down but it
can happen in an unpredictable way that depends on small decisions on
the way routine was programmed that ordinarily make no difference and
thus are unlikely to be noticed in the testing process.

You might want to look for really super high leverage points in the
relevant design matrix. That might help identify if you have a
problem. Also, start with a model that just has the offending variable
in it and then start adding variables to the model.

Another might be to experiment with different top coding schemes on the prices.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index