Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: quasi-complete separation

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: quasi-complete separation Date Sun, 28 Aug 2011 17:29:02 +0100

```I don't know where 48.5 comes from so I can't comment on that.

. input y   x

y          x
1.  0   1
2.  0   2
3.  0   3
4.  0   4
5.  1   1
6.  1   2
7.  1   3
8.  1   4
9.  1   5
10.  1   6
11.  1   7
12. end

. logit y x

Iteration 0:   log likelihood = -7.2102995
Iteration 1:   log likelihood = -6.3453449
Iteration 2:   log likelihood =   -6.31452
Iteration 3:   log likelihood =  -6.314268
Iteration 4:   log likelihood =  -6.314268

Logistic regression                               Number of obs   =         11
LR chi2(1)      =       1.79
Prob > chi2     =     0.1807
Log likelihood =  -6.314268                       Pseudo R2       =     0.1243

------------------------------------------------------------------------------
y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x |   .5126268   .4305072     1.19   0.234    -.3311519    1.356405
_cons |  -1.076253   1.436463    -0.75   0.454    -3.891669    1.739162
------------------------------------------------------------------------------

In my ignorance I was not aware until now of the terminology of
"quasi-complete separation" although Googling reveals several long
discussions. Evidently there are datasets which are difficult or
impossible to model with -logit- or -probit-. So, what else is new?
Whether it helps to use this terminology I don't know. It just sounds
like giving the problem a name to me. Others may be able to add deeper

Nick

On Sun, Aug 28, 2011 at 5:01 PM, Sabrina Helmut <vitamint@hotmail.de> wrote:
> Nick,
> thanks! You are right, logit works but the coefficient for the concerned variable is extremely high (48.5..) I will need an explanation for this. So, do you think my example shows quasi-complete separation which could be an explanation for the high coefficient?
>
> ----------------------------------------
>> Date: Sun, 28 Aug 2011 16:43:36 +0100
>> Subject: Re: st: quasi-complete separation
>> From: njcoxstata@gmail.com
>> To: statalist@hsphsun2.harvard.edu
>>
>> Sabrina, and indeed anybody else: Please do not send, or attempt to
>> send, attachments to Statalist.
>> this is explained, twice over.
>>
>> Sabrina: -logit y x- will work with this dataset, but there is only a
>> weak relationship.
>>
>> Nick
>>
>> On Sun, Aug 28, 2011 at 4:24 PM, Sabrina Helmut <vitamint@hotmail.de> wrote:
>> > I am sorry, the scatter has not been send. Thus, an example for you:
>> >
>> > binary dependent variable y
>> > continuous variable x
>> >
>> > y   x
>> > 0   1
>> > 0   2
>> > 0   3
>> > 0   4
>> > 1   1
>> > 1   2
>> > 1   3
>> > 1   4
>> > 1   5
>> > 1   6
>> > 1   7
>> >
>> > Thus, values of the independent variable being higher than 4 are only captured by y=1.
>> > So, is this a problem of quasi-complete separation? Thank you very much.
>> >
>> >
>> > ----------------------------------------
>> >> From: vitamint@hotmail.de
>> >>
>> >> I provided a scatter for you. Am I right with the assumption that it shows the problem of quasi-complete separation? Thanks.
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```