Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Problem dealing with predicted probabilities from mixlogit

 From Arne Risa Hole To statalist@hsphsun2.harvard.edu Subject Re: st: Problem dealing with predicted probabilities from mixlogit Date Sun, 11 Sep 2011 12:44:03 +0100

```Esther

Here’s my attempt at an explanation. In the panel mixed logit
likelihood the probability of the observed sequence of choices for
each individual is calculated at each draw and averaged over draws.
You cannot replicate this using -mixlpred- as this calculates the
probability of a single choice, not the choice sequence. If you
multiply the predicted probabilities to form the individual likelihood
the averaging over draws and the multiplication over choices have
switched place, which is why the results are different.

If you are interested in predicting the individual log-likelihoods I
would use our new -gmnlpred- command with the -ll- option as you
<http://www.shef.ac.uk/economics/people/hole/stata.html>. Note that
the paper describing the -gmnl- command is _submitted_ to the Stata
Journal.

I hope this helps.

Arne

> Hi Cam,
>
>
> no, I know why it provides a better fit, the problem is with the LL reported by mixlogit which cannot be calculated using the predicted probabilities from mixlpred, and therefore, what can be done, or not, with those probabilities.
>
>
> Anyway, I repeat myself, it would be nice if someone could take a look at my question.
>
> Many thanks!
>
> ----- Original Message -----
> From: Cameron McIntosh <cnm100@hotmail.com>
> To: STATA LIST <statalist@hsphsun2.harvard.edu>
> Cc:
> Sent: Sunday, 11 September 2011, 0:03
> Subject: RE: st: Problem dealing with predicted probabilities from mixlogit
>
> I thought the question was "why" the panel model provided the better fit. I thought my response (and reference) would clarify that. Sorry for any misunderstanding.Cam
>
> ----------------------------------------
>> Date: Sat, 10 Sep 2011 22:59:50 +0100
>> Subject: st: Problem dealing with predicted probabilities from mixlogit
>> To: statalist@hsphsun2.harvard.edu
>>
>> (Forgot to put the subject line, corrected)
>>
>> Hi Cam,
>>
>> thanks, but yes, I know the panel mixlogit provides a better fit. I am sorry but that does not answer my question.
>>
>> Glad to see however that at least in your case you go the whole question, it seems like, when I go on the statalist archive, it is truncated in the middle.
>>
>> Ester
>>
>>
>> Hi Ester, So you are trying to maximize the loglikelihood of someone telling you something that you don't already know? :)  Let me try... in the mixed multinomial logit model without the panel specification, you are assuming preferences vary across persons but are invariant across the series of choice tasks for the same individual. Once you add the panel specification, you allow for within-person heterogeneity across choice tasks, and hence, the fit improves.
>>
>> So yes, the panel specification is more appropriate. See: Hess, S., & Rose, J.M. (2009). Allowing for intra-respondent variations in coefficients estimated on repeated choice data. Transportation Research, Part B, 43(6), 708-719.
>>
>> Hope this helps,
>> Cam
>>
>> > Date: Sat, 10 Sep 2011 19:11:39 +0100
>> > Subject: st: Problem dealing with predicted probabilities from mixlogit
>> > To: statalist@hsphsun2.harvard.edu
>> >
>> > I am using the mixlogit command by Arne Risa Hole. It includes traindata.dta and is explained in http://www.stata-journal.com/article.html?article=st0133 >
>> > When I run
>> >
>> > use traindata.dta
>> > global randvars "contract local wknown tod seasonal"
>> > mixlogit y price, rand(\$randvars) group(gid) nrep(50)
>> >
>> > mixlpred p
>> > gen  lnp01=y*ln(p)
>> > egen LL=total(lnp01)
>> >
>> > I obtain the same LL as that reported by the package, but if I want to take account of the panel nature of the data and run:
>> >
>> > use traindata.dta
>> > global randvars "contract local wknown tod seasonal"
>> > mixlogit y price, rand(\$randvars) group(gid) id(pid) nrep(50)
>> > mixlpred p
>> > gen  lnp01=y*ln(p)
>> > egen LL=total(lnp01)
>> >
>> > I get LL= -1356.44 while the output of the regressions gives -1126.1653.
>> >
>> > From what I have gathered, this is due to the panel nature of the data, though frankly, I have *no idea* how this explains the difference (the constant terms in the individual utilities cancel out in a choice model, so the explanation is not that the individual constant term cannot be estimated). Note that I have read quite a bit on the topic, so just referring me the articles in the Stata Journal containing the formulas used in the model will not help me, since I do not understand how those explain the difference. A patient and detailed explanation would however be very useful, but at this point I have more or less given up on this!
>> >
>> > Anyway, my issue is that I use the predicted probabilities to compute choice probabilities across alternatives in another dataset under different assumptions on the process consumers follow to select alternatives. I compare the individual LL for each choice process and then assign consumers to different types depending on the choice process that maximises their LL.
>> >
>> > My issue is therefore that the individual LL I compute using the predicted probabilities from the mixlogit model are not correct. To drive the point further, it is not even useful for me to use gmnlpred of the gmnl command, by Gu, Hole and Knox (forthcoming in Stata Journal, available here: http://www.shef.ac.uk/economics/people/hole/stata.html), and compute the individual log-likelihood with it, because it will not be possible for me to compute similarly the individual log likelihood on the other dataset.
>> >
>> > My question is then, should I therefore use the predicted probabilities from mixlogit with the panel specification rather than without the panel specification, just because the reported AIC of panel mixlogit is better? Or should I prefer the predicted probabilities from the non-panel version since the individual likelihood I would be calculating from this would at least be correct?
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *  http://www.stata.com/help.cgi?search
> *  http://www.stata.com/support/statalist/faq
> *  http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```