Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model


From   tshmak <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
Date   Mon, 27 May 2013 10:45:03 +0800

Hi Kyle,

You said your model follows Rabe-Hesketh's dropout model. Do you think you can give us a link to the description of that model? Also you may want to provide more information on how you defined B and f1_1 and constraints 1/5.

My experience is that the kind of problem that you encountered is not uncommon if you have a large dataset, and also when you use ML on models that are not your standard generalized linear models. So I concur with others that the fact that you receive an error message like that is no indication that the particular software has a bug or is poorly conceived. As an example, I've had a dataset on which I ran Stata's -poisson- as well as -zip-. Now, Stata was able to find a converged solution using -poisson- but not -zip-, but I don't think StataCorp needs to make a disclaimer that -zip- has the limitation that it converges less often than -poisson-. It is kind of implicit that because -zip- considers a larger class of models than -poisson-, it is likely that there are situations where -poisson- is able to find a unique ML estimate, but -zip- is not. (Now -gllamm- considers an enormous class of models.)

One of the reasons why -gllamm- or any ML procedure in general fails to find a unique maximum is that Stata (as well as pretty much any stats software) does its calculations in double precision, i.e. you cannot ask it to do calculations to more than 16 significant figures or so. In all likelihood, a software would be able to find a unique solution to your problem if it can do calculations in infinite precision. It may be tempting to seek out such a piece of software, but in the vast majority of cases, this is a silly solution, because the fact that a unique maximum cannot be found using double precision calculation is indication that your model is an overfit to your data, such that even if a unique value is found, you probably don't want to trust it anyway.

In addition to advice from others, I think you may want to use the -trace- option in gllamm to see which parameter is -gllamm- failing to converge on, so that you know which part of your model is overspecified.

HTH,

Tim




-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jeph Herrin
Sent: 25 May 2013 01:47
To: [email protected]
Subject: Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model

Your original post seemed to place the blame on the code, and the authors of it, for not working the way you thought it
should, before you made the effort to understand what was going on. But when a model does not converge for your data,
yes, it is a reasonable response to suggest you consider the data, and whether the model is appropriate for it, rather
than confirming that "what [you] think is really needed" is for the authors of the code to "clarify their 2002 work".

However, I appreciate that there might have been some miscommunication, so I'll expand on what I hope is the most useful
part of my reply. The errors you are seeing ("flat or discontinuous region encountered") are generated because the
program is unable to converge on a single solution. This message is not unique to -gllamm- but is reported by any Stata
routine that uses maximum likelihood estimation.

Generally, one resolves this kind of problem by starting with a simpler but analogous model that does converge and then
adding complexity to the model to determine what parameter(s) are causing the problem. It is not clear to me how to do
that here, so some other questions:

Why did you need to introduce x_i1, when there was not one in the original model? When you say it will not work without
it, what do you mean? Because it is also not working with it, apparently, and it seems that in doing so you are changing
the meaning of several of the options, such as geqs().

How many observations did you simulate? Complex models typically don't converge as easily for small datasets.

Where does i2 appear? You refer to it, but it is not in the model.

What is the link you mention? Might be helpful if we could compare with the model you were trying to replicate.

Where did you get the matrix B? It specifies a starting point (as it were) for finding a solution, and if it is poorly
specified, it will cause trouble.

cheers,

J


On 5/24/2013 11:24 AM, Kyle Fluegge wrote:
> There might be some misunderstanding about what I have done with this. This is not my data per se or my model even (in terms of applying results to real-world situations). I have not even used my real data that I would like to model. May I ask how you initially go about investigating a model or program that you would like to use? I always like to use a programmer's code and sample data to ensure their model does what it says before even attempting to apply to my own situation. If it does not run, then I become skeptical and ask for clarification or seek a resolution via another channel (i.e., statalist) before using. This is a case of that. Your response is not particularly helpful or a significant contribution in the search for a resolution. Asking for the original author's code might have been a more appropriate response in advocating a resolution to the matter at hand. Having different data is a problem every researcher has. You would have to admonish almost everyone who
  !
>   posts to Statalist because the root problem for their difficulties is essentially a different dataset, no?
>
> The model (as specified by the authors) requires a single response vector with continuous and binary outcomes. They give the code to create this, and that is what I used. I created a data set that met this single criterion. I did not create a vector of all constants, all zeros, letters, or anything of the sort. I believe you might be minimizing the extent of the problem as I see it. While I could have easily made a mistake (I have admitted this several times), I do not think it is as simple a misunderstanding as you have described. The authors also give a sample view of their datasheet (10 or so observations) created with this code and it appears to match very similarly to what I created (albeit, not exactly, which we have already established is a limitation). I should find a way to post the URL of this so you all can view for yourselves what is available (I have tried before and my message would not post). I would then invite you to replicate something similar to see if it
  !
>   runs for you (I hope it would, then I know my error is indeed fixable).
>
> I suppose the broader and main point with this is if the data has to be so particular (that only the original authors can truly produce what is needed to run), how useful and/or generalizable can this particular aspect of gllamm for multivariate modeling really be? And if the data generation is so sensitive, why not mention it as a limitation or significant caveat to model implementation?
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Jeph Herrin
> Sent: Friday, May 24, 2013 10:33 AM
> To: [email protected]
> Subject: Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
>
> The errors you are seeing are those that would be generated if -gllamm- could not find a solution. If the model you are specifying is not appropriate for your data, for example, which is the most likely explanation.
>
> More generally, it is somewhat misleading to claim that you are trying to fit the 'exact same model' when you have a different dataset. An analogy would be if I showed you a slide where I estimated
>
>    logit y x1 x2
>
> with some results and then you tried to run the same model on a dataset where all the variables are constant - you would see lots of errors. If you wrote me to complain that my model wouldn't converge on your dataset, I would likely not respond either!
>
> Hope this helps,
> Jeph
>
>
>
>
> On 5/23/2013 9:42 PM, Kyle Fluegge wrote:
>> I agree with your clarity and caution note. I am not assigning blame to anyone. Simply noting that a modeling framework has been marketed within gllamm by its authors, but does not appear to run. That is the only thing I can say with the information I have at my disposal. The error could be a data issue or some other error I have yet to recognize. I will say that other gllamm models I have run do work; this is not criticism of the gllamm framework as a whole, just in this particular case.
>>
>> I do not have the dataset that Sophia used. That is a crucial detail as you note. My apologies for the confusion: the replica was meant to refer to syntax (which can indeed mean very little if the data is different). Within the Rabe-Hesketh note on this particular model, she and her co-authors provided coding to shape the data in a long form required to run the model. I used that code to shape the dataset I used. Everything on that front worked perfectly. I had presumed their "x" (used in their code) referred to one explanatory variable. I created an explanatory variable (the "x") and used that. Other that this alteration, everything else is the exact same. Whether that is the reason the model is not converging remains an open issue and perhaps worthy of further discussion on this list, I do not know.
>>
>> I have attempted to follow-up with the authors regarding it (=the data issue), but to no avail. So I am left to interpret. I completely acknowledge the error could be (and probably is) my own; that is why I posted to the Statalist in attempt to resolve it with others' help. No suggestion is too minor.
>>
>> I am not a very experienced Stata programmer, another limitation I wholly acknowledge. I do what I can, but programming errors are not as noticeable to me as other more experienced programmers.
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Nick Cox
>> Sent: Thursday, May 23, 2013 9:21 PM
>> To: [email protected]
>> Subject: Re: st: Rabe-Hesketh's gllamm: multivariate multilevel
>> dropout model
>>
>> I think we need complete clarity and considerable caution here.
>>
>> Your previous post claimed that you are using an exact replica of Sophia [Rabe-Hesketh]'s model, except that you changed something. I don't know these models and so cannot judge whether your change was trivial or substantive, but on the face of it one of those statements is wrong or at least confusing.
>>
>> Are you using exactly the same dataset as Sophia used? That is a crucial detail.
>>
>> I certainly agree that exactly the same model on exactly the same dataset should produce the same results now with -gllamm- as in 2002, and if not there should be an explanation why. -gllamm- has changed and Stata has changed, meanwhile, and no one can be confident with large complicated programs that something might not have been broken.
>>
>> I don't know how much experience you have in Stata programming, but I have some. There are certainly programs of mine in the public domain that might not converge with particular datasets; I've had that experience myself and typically conclude from graphical evidence that I was trying to get a cat to pretend it was a dog, and that was a bad idea. With your kind of model such checks are, as I understand it, typically not available.
>>
>> It's my impression that Sophia gets far more requests for -gllamm- support than she can possibly handle. That's a tough call all round.
>> She's not an active member of Statalist.
>> Nick
>> [email protected]
>>
>>
>> On 24 May 2013 01:51, Kyle Fluegge <[email protected]> wrote:
>>> The notable problem is that this is not my model, exactly. I have simulated the minimum number of variables to make it run. This is the model provided by Rabe-Hesketh and colleagues at Stata User Group Meeting in Maastricht, May 2002. Thus, not being able to replicate it may or may not signify a broader problem here. Hopefully, if others who have attempted to run it have noted similar problems, they can speak up within this list to contribute their alterations to the code I have provided or to provide incentive for Rabe-Hesketh and colleagues to perhaps clarify their 2002 work in a more general sense. The latter is what I think is really needed. I have not seen this model used in the literature (or at least from what I have read; there are probably papers out there somewhere), which may lend credibility to the fact that the gllamm simply cannot estimate a model like this, contrary to what Rabe-Hesketh and colleagues have proclaimed. Thank you for your assistance.
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Nick Cox
>>> Sent: Thursday, May 23, 2013 8:36 PM
>>> To: [email protected]
>>> Subject: Re: st: Rabe-Hesketh's gllamm: multivariate multilevel
>>> dropout model
>>>
>>> The short answer is likely to be that you are doing nothing wrong that we can identify for you.
>>>
>>> -gllamm- (SSC) is a very general, indeed highly versatile, command that is more like a family of commands. However, many of the models it covers are difficult to fit -- or conversely many of the models are often applied to data that aren't suitable. Where to put the blame is an open and delicate matter. Naturally it is usually impossible to be clear about suitability before trying a fit, but having correct syntax is not a guarantee of anything but having correct syntax.
>>>
>>> People who are familiar with your kind of model may well be able to
>>> add more specific comments. Means of binary variables being very near
>>> 0 or very near 1 can be problematic.
>>>
>>> The recent thread starting here has other advice, some specific:
>>>
>>> http://www.stata.com/statalist/archive/2013-05/msg00665.html
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 24 May 2013 00:53, Kyle Fluegge <[email protected]> wrote:
>>>> Dear Statalisters,
>>>>
>>>> I am attempting to model a multivariate multilevel dropout model with gllamm. The data set is in long form, with response vector including both binary and continuous data. As for notation, x_i1 is a dichotomous variable predicting the continuous outcome, i1 is variable denoting records within the substantive model, i2 is variable denoting records within the dropout/selection model (probit), y0_i2do is variable referring to concurrent continuous outcome's impact on dropout, and y1_i2 is lagged variable referring to previous continuous outcome's impact on current dropout. The model syntax is below (it is an exact replica of Rabe-Hesketh's dropout model):
>>>>
>>>> gllamm resp x_i1 i1 y0_i2d0 i2 y1_i2, i(t id) eqs(eta1_1 eta2_1)
>>>> nocons  /* */ family(gauss binom) fv(var) link(ident probit) lv(var)
>>>> bmatrix(B) geqs(f1_1) frload(1) constr(1/5)/* */ nats nip(7) adapt
>>>> trace
>>>>
>>>> When running this model, it is not converging and produces errors that "numerical derivatives are approximate" and "flat or discontinuous region encountered". I am curious to know what I am doing wrong. The only thing that I have changed from Rabe-Hesketh's model in the link is that x_i1 is a dichotomous explanatory variable (and that is because the model will not run without an "x"). Everything else is exactly the same. Why is this not running? I have contacted the authors of gllamm, who have not responded. Has anyone else been able to run this model as Rabe-Hesketh et al. have written and had success?
>>>>
>>>> Sincerely,
>>>>     kyle
>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index