Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: ordered logistic integration problems
Nick Cox <firstname.lastname@example.org>
Re: st: ordered logistic integration problems
Thu, 21 Mar 2013 15:03:22 +0000
I agree with the implication that -glm- is astonishingly little known
as a way of handling responses that are continuous proportions. The
FAQs cited both predate an excellent concise review
SJ-8-2 st0147 . . . . . . . . . . . . . . Stata tip 63: Modeling proportions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. F. Baum
Q2/08 SJ 8(2):299--303 (no commands)
tip on how to model a response variable that appears
as a proportion or fraction
But I don't think that your response is statistically that odd-ball.
It is a fraction or proportion based on counts, and few things could
be more statistical. (I wouldn't call it a percent, but that is a
small language issue.)
There also seems nothing unusual in the idea that different
proportions arise from different combinations. 1/1 of cars in our
household have four seats and 3/3 cars in my friend's household. The
same fraction, different situations, some information loss on data
reduction. Whether a fraction is the best scale for research is a good
scientific question, naturally.
However, it sounds as if your distribution is rather lumpy, which
won't make anything easier, but it is difficult to guess whether that
will be really problematic.
-glm- does not entail numerical integration. That doesn't mean it
always converges to sensible results.
On Thu, Mar 21, 2013 at 2:18 PM, Bontempo, Daniel E <email@example.com> wrote:
> Thanks. I had not realized the glm command could do handle the 0's and 1's. That may be the best distribution, although the DV is such an oddball animal half count, half proportion, and a bit standardized to each person - recall it is the percent correct of the count of spontaneously attempted past tense verb form in a given period of recording their speech.
> Also, unlike many proportions in developmental science showing floor and ceiling effects, where the variance is small for all 0's early on, large in the middle, and small again as kids score mostly 1's later on, this is very odd because of the "spontaneous" aspect. The kids are clever, and they choose easier verbs (e.g., put) in the middle, with the consequence that percent % does not always mean the same thing - because it leaves out the dimension of "difficulty" of the attempts.
> Returnign to the issue of integration, like ologit, glm seems to be running fine. I do not think numerical integration is involved in the iterations these routines are doing. The ones doing numerical integration seem to have the trouble with this data.
> My lingering question is do I take the integration difficulties in some routines as a reason to suspect the results of glm when it runs without issue?
> My guess, that you are spreading the data too thin. If I follow you, the DV has 12 values, and 90% of the cases are a 1, which means the other 11 values average less than 1% of the cases. With gologit2 you are estimating 11 sets of coefficients. I am not surprised you have to collapse to only 3 categories.
> But why are you using an ordinal model in the first place? Why not a model specifically designed for proportions? See, for example,
Bontempo, Daniel E
>>Can anyone explain the kind of data conditions that cause gllamm or
>>glogit2 to spit out:
>>flat or discontinuous region encountered numerical derivatives are
>>approximate nearby values are missing could not calculate numerical
>>derivatives missing values encountered r(430);
>>I have a colleague with proportion data that only has about 12 discrete
>>values between 0 and 1 with about 90% 1's. Skew -3.27, Kurtosis>15.
>>We want to model for 3 groups (between) and 3 occasions (within).
>>Prior work published in 2000, had similar proportions and used HML
>>(Gaussian) and got interpretable results. After looking at the
>>distributions, I suggested ologit might be more appropriate than regress.
>>I was already concerned about these proportion DVs because my colleague
>>has calculated proportion correct of however many scorable events there
>>were, and the number of events differs a lot from subject to subject.
>>Some have 2 some have 10. BUT - my question for the moment is technical
>>difficulty with numerical derivatives.
>>Since there is occasion nested within person, I was interested in
>>gllamm with the ologit link, as well as robust ologit with
>>"cluster(subject)". I also tried glogit2 because I was unsure the
>>parallel regression assumption was met.
>>I easily get ologit to run. However both gllamm and glogit2 make
>>similar complaints about missing or discontinuous numerical derivatives
>>and do not complete. I tried the log-log link in glogit2 since the
>>values rise slowly from 0 and suddenly go to 1. I kept rounding to get
>>I have to collapse to only 3 levels to get glogit2 to run. gllamm keeps
>>telling me to use trace and check initial model, but when I do I see
>>reasonable fixed effect values.
>>Is ologit able to use an estimation method that avoids these
>>I am trying to get the disaggregated data so multilevel logistic
>>regressions can be done, but it is not clear disaggregated data will be
>>Any pointers, advice, suggestions, references ... all appreciated.
* For searches and help try: