Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Is this the right code if I want to compare group 1 vs group 4 in a logistic regression model?

 From Alfonso Sánchez-Peñalver To Stata List Subject Re: st: Is this the right code if I want to compare group 1 vs group 4 in a logistic regression model? Date Wed, 4 Dec 2013 10:34:32 -0500

```Hi Laura,

I agree with Nick that you have asked the question backwards which is why I was confused. In your model (logit or ordered logit) the response variable is the categorical variable, and the explanatory ones would be age and sex. Logit or ordered logit allow you to estimate the probabilities (or the log-odd ratios) of belonging to any of the categories. What you will be able to find is whether increasing your age increases (or decreases) the probability of belonging to the higher or the bottom group, for example. So let’s say your response variable was income and you had broken it down in different groups that make sense to you because they represent different social classes, for example. You would expect that as people grow up they increase their income, and thus, you would expect that the higher income groups would have people with higher ages than the lower income groups, in general. Logit or ordered logit would allow you to estimate by how much the probability of belonging !
to one group increases (or decreases) by age increasing by 1 year let’s say. Thus in my example we would expect the probabilities of belonging to the higher income groups to increase with age, and the probabilities of belonging to the lower income groups to decrease with age.

Ordered logit simply takes into account the natural ranking of the categories. In the income example, belonging to a higher income group has more meaning that simply being in that category. It means that your income is higher and thus has some more information. Ordered logit captures this.

Best regards,

Alfonso

On Dec 4, 2013, at 10:12 AM, Nick Cox <njcoxstata@gmail.com> wrote:

> Usually the wrong way round: in your example, age and sex are
> predefined or given, and the question is what they imply.
>
> Sometimes this is causal (as a male I could never have had babies) but
> more commonly it is a matter of association (e.g. implications of age
> for experience or stamina).
>
> The effects _on_ age and sex of anything are limited, I believe, to
> what can be done surgically.
>
> This point may be just a consequence of your choosing the wrong small
> words, but as you are likely to be writing in English it is important
> to get this straight.
>
> On ordered logit, come on please! Typing -search ordered logit- in
> Stata shows that you are sitting right by several resources.
>
> Nick
> njcoxstata@gmail.com
>
>
> On 4 December 2013 14:54, Meems, LMG <l.m.g.meems@umcg.nl> wrote:
>> Hi Alfonso,
>>
>> Thank your for the answer. I'm sorry my question has been that confusing, I'll try to explain it once again.
>>
>> What I want to know (and I thought the logistic regression model suited the best to get this answer) is how belonging to a certain group (let's say low vs high) results in effects on age and sex (just 2 examples. In my model I have plenty of other variables which I also want to test).
>> For example, if people in the lower group are significantly at a different age and sex than people in the higher group.
>>
>> Btw, I'm not familiar with ordered logit. What is it exactly?
>>
>> Best,
>>
>> Laura Meems
>>
>> -----Oorspronkelijk bericht-----
>> Van: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Namens Alfonso Sanchez-Penalver
>> Verzonden: woensdag 4 december 2013 15:36
>> Aan: statalist@hsphsun2.harvard.edu
>> Onderwerp: Re: st: Is this the right code if I want to compare group 1 vs group 4 in a logistic regression model?
>>
>> Hi Laura,
>>
>> You mention you break up a continuous variable into four categories and then use a logit regression. I believe in this case ordered logit would be more appropriate, since the categories follow the natural order of the continuous variable.
>>
>> Having said that I am a bit confused about your main question. You say "I want to compare the lowest group (0) with the highest group (3) and the effects on age and sex". I thought the groups were the response variable, because the logit model would allow you to calculate effects on belonging to a group or another. Did you mean you want to know what the difference in the effects that age and sex would have on the probability of belonging to the lowest group and the probability of belonging to the highest group? If so, or something similar, you can use margins after the ordered logit regression to estimate the effects on the probabilities of belonging to each of the groups of any variable of interest and then take the difference for the groups you want.
>>
>> Sorry if I misunderstood your message, please let me know if my interpretation is what you were after.
>>
>> Best,
>>
>> Alfonso Sanchez-Penalver
>>
>>> On Dec 4, 2013, at 9:15 AM, "Meems, LMG" <l.m.g.meems@umcg.nl> wrote:
>>>
>>> Hello Statalisters,
>>>
>>> After a couple of days filled with STATA and database work, I really need a check if what I'm doing is right..
>>>
>>> At the moment I'm looking at the predicted effect from a continuous variable (Y) on a couple of other parameters.
>>> I decided to split the continuous variable in 4 groups: thereby following it's clinical reference values (e.g. sufficient, insufficient etc.).
>>>
>>> After this step I wanted to fit this variable in a regression model, using logistic regression (as I thought that dividing it in groups turned the continuous variable into a categorical one..). So far, so good..
>>>
>>> However, let's say I now want to compare the lowest group (0) with the highest group (3) and the effects on age and sex.
>>> The code I used to do this is:
>>> Char (Y) [omit] 3
>>> Xi: logit i.Y + age sex
>>>
>>> This resulted in coefficients for age and sex, but also resulted in 2 ommitted values, namely group 1 and 2. With the comment that group 1 and 2 !=0 and predicted failure perfectly.
>>>
>>> So, this result made me doubting about the code. Is this the right code to use and what exactely do these 2 ommitted values mean? Is it a result from the code I made (that would be the good scenario) or is it something wrong and should I correct for it (or even correct the code)?
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```