Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to compare performance (goodness-of-fit) of very different modelling approaches?


From   "Austin Nichols" <[email protected]>
To   [email protected]
Subject   Re: st: How to compare performance (goodness-of-fit) of very different modelling approaches?
Date   Fri, 16 Jan 2009 11:26:20 -0500

Eva Poen <[email protected]>:
Just to be clear--my idea was not that you would produce a graphs for
21 models (or one graph with 21 bars per category of model), but 21
graphs for 3 or 4 models.  Each "byhist" would show the distribution
of actual values corresponding to those predicted to have outcome y
from 3 or 4 models, and the graphs would number 21 as y=0 to 20.
Presumably, the graphs for y=0 and y=20 would be of most interest.

On Fri, Jan 16, 2009 at 10:39 AM, Eva Poen <[email protected]> wrote:
> Thanks, Austin. My outcome variable has indeed 21 "categories", but
> from an economic point of view it can be considered continuous. 21
> values seem a bit too many for a -byhist-, but it is a nice program,
> and I will certainly consider using it in some way to aid
> visualization.
>
> Eva
>
>
> 2009/1/15 Austin Nichols <[email protected]>:
>> Eva Poen <[email protected]>:
>> Does the outcome variable have only 21 categories {0,...20} or is it continuous?
>> Maybe you could produce 21 histograms (with fractions for each of the
>> models overlaid or "interlaced" on one graph) characterizing the
>> distribution of observed values for those predicted to have Y=0, ...
>> 20.  See -byhist- on SSC for making interlaced histograms.
>>
>> On Thu, Jan 15, 2009 at 11:59 AM, Eva Poen <[email protected]> wrote:
>>> Dear all,
>>>
>>> currently I am working on slightly complicated mixture models for my
>>> data. My outcome variable is bounded between 0 and 20, and has mass at
>>> either end of the interval. Whether or not I analyse the data on the
>>> original [0,20] scale or a transformation to [0,1] (fractions) does
>>> not make any difference to me.
>>>
>>> My question concerns the goodness of fit. I would like to compare the
>>> goodness fit of the complicated finite mixture model to much simpler
>>> models, e.g. the tobit model, the glm model. and a hurdle
>>> specification. Since the likelihood values of these models differ
>>> substantially, likelihood based measures such as BIC appear to be
>>> inadequate for the purpose. Also, measures that compare the model
>>> likelihood of the fitted model to the null likelihood ("pseudo r2")
>>> are difficult sine I can calculate them for the tobit and glm models,
>>> but not for the mixture model, as it is unclear what the null model
>>> would be.
>>>
>>> So far I have been looking at crude measures like correlation between
>>> predicted outcome and actual outcome, but I feel that this is
>>> inadequate, especially since the outcome variable is bounded. I'd be
>>> grateful for hints and comments. I am working with Stata 9.2.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index