st: RE: mlogit problem with "predict"

Tue, 07 Aug 2007 10:04:37 -0500

Maarten, many thanks for your helpful response. As soon as I posted the message I wished I had clarified what the variables pc1-pc4 were.

------Maarten Buis wrote:

Are pc1 - pc4 dummy variables for a single categorical variable (called pc)?They are quantitative variables--principal component scores from my original data set, which consisted of a series of measurements taken on a set of human skulls that make up my cases (I'm an anthropologist, lest that sounds too morbid).

I ran PCA in order to reduce the variables and address the fact that measurements taken on the skull are not independent. Running PCA allows me to work with orthogonal data.

What you are looking for are (almost) empty cells, and (almost) perfectThe idea that maybe I have an (almost) perfect prediction occurred to me too, but I don't know how to investigate it. Perhaps there is an equivalent test to -tab tax pc- that allows me to look at quantitative variables for anything that is off?

predictions, or anything else that looks odd. It's pretty hard to

explain what it is that makes such a table look "odd", other than if

something is really wrong it is usually pretty obvious (though not

always).

I propose an incremental approach: See if you can solve the problem

by looking at -tab tax pc-, and report back to us. If that solves the

problem, great, if not, we'll try something else. (Notice that we are

living in different time zones, so it might take some time before I

get back to you, but somebody else on the Statalist might jump in)

Thanks again,

Sheela

Hello,

I am using Stata 9 for the PC, and have run the following command:

mlogit tax pc1 pc2 pc3 pc4, vce(jackknife)

Where (tax) has five outcomes

The regression results seem to be fine, but when I then try to run "predict p1 p2 p3 p4 p5" to obtain posterior probabilities, I get the following:

"p1: 27 missing values generated" (note: there are more than 27 cases in the data set)

And the resulting posterior probabilities are completely off-- every value of p1 is either 0, 1 or missing (".")

And, for a handful of cases, the values of p1-p5 are *all* 0.

This also happens when I do not use the vce(jackknife) command, and when I use three instead of four independent variables.

I suspect something is wrong among my cases (or variables) such that maybe two of my groups are so highly correlated that I am getting these results? But I am not familiar enough with the principles of multinomial logit to know. Any help or advice would be much appreciated.

Many thanks,

