Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: "if" statement


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: "if" statement
Date   Wed, 16 Sep 2009 22:08:19 +0100

First off, the word "statement" could be read variously, so I underline
there is no -if- command in view here. All the uses of -if- are of the
-if- qualifier. That would be pedantry except that many people confuse
the two. 

Second, I think you are focusing on the wrong issue. The point is simply
that one indicator variable is redundant given the rest. How the
variables were created is immaterial, and indeed the code below, which
entails no use of -if- for creating your age indicators, shows that -if-
is itself not to blame for anything in this territory. 

Third, I imagine that what you are seeing in the second case is that
-xi:- is inclined to drop _Iageg_2 and then -logistic- works out that
that inclination is irrelevant. I see nothing "worrisome" in that; it's
a natural consequence of Stata's division of labour here. 

Nick 
n.j.cox@durham.ac.uk 

webuse nhanes2f, clear
gen ageg= floor(age/10)
replace sex=0 if sex==2
xi: logistic sex i.ageg

i.ageg            _Iageg_2-7          (naturally coded; _Iageg_2
omitted)

Logistic regression                               Number of obs   =
10337
                                                  LR chi2(5)      =
2.80
                                                  Prob > chi2     =
0.7302
Log likelihood =  -7150.626                       Pseudo R2       =
0.0002

------------------------------------------------------------------------
------
         sex | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
    _Iageg_3 |   .9761655   .0632659    -0.37   0.710     .8597191
1.108384
    _Iageg_4 |   .9971218   .0696639    -0.04   0.967     .8695188
1.143451
    _Iageg_5 |   .9424283    .065592    -0.85   0.394     .8222534
1.080167
    _Iageg_6 |   .9903392   .0554211    -0.17   0.862     .8874609
1.105144
    _Iageg_7 |   .8963705   .0683978    -1.43   0.152     .7718562
1.040971
------------------------------------------------------------------------
------

Nick 
n.j.cox@durham.ac.uk 

Victor Mauricio Herrera MD MS

I'd appreciate if somebody could explain the following behavior of the
"if" statement when used with "logistic" (I'm running STATA IC/10.1). 

webuse nhanes2f
gen ageg=2 if age>=20 & age<30
replace ageg=3 if age>=30 & age<40
replace ageg=4 if age>=40 & age<50
replace ageg=5 if age>=50 & age<60
replace ageg=6 if age>=60 & age<70
replace ageg=7 if age>=70
replace sex=0 if sex==2

model 1 --> xi: logistic sex i.ageg
i.ageg            _Iageg_2-7          (naturally coded; _Iageg_2
omitted)

Logistic regression                               Number of obs   =
10337
                                                  LR chi2(5)      =
2.80
                                                  Prob > chi2     =
0.7302
Log likelihood =  -7150.626                       Pseudo R2       =
0.0002

------------------------------------------------------------------------
------
         sex | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
    _Iageg_3 |   .9761655   .0632659    -0.37   0.710     .8597191
1.108384
    _Iageg_4 |   .9971218   .0696639    -0.04   0.967     .8695188
1.143451
    _Iageg_5 |   .9424283    .065592    -0.85   0.394     .8222534
1.080167
    _Iageg_6 |   .9903392   .0554211    -0.17   0.862     .8874609
1.105144
    _Iageg_7 |   .8963705   .0683978    -1.43   0.152     .7718562
1.040971
------------------------------------------------------------------------
------

model 2 --> xi: logistic sex i.ageg if age>=30
i.ageg            _Iageg_2-7          (naturally coded; _Iageg_2
omitted)

note: _Iageg_4 dropped because of collinearity

Logistic regression                               Number of obs   =
8017
                                                  LR chi2(4)      =
2.35
                                                  Prob > chi2     =
0.6713
Log likelihood = -5544.1939                       Pseudo R2       =
0.0002

------------------------------------------------------------------------
------
         sex | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
    _Iageg_3 |   .9789833   .0734452    -0.28   0.777     .8451164
1.134055
    _Iageg_5 |   .9451487   .0748512    -0.71   0.476     .8092619
1.103853
    _Iageg_6 |   .9931979   .0670654    -0.10   0.919     .8700789
1.133739
    _Iageg_7 |   .8989579   .0765455    -1.25   0.211     .7607821
1.06223
------------------------------------------------------------------------
------

Why is the age group 4 (40-49) dropped due to collinearity if there are
610 males and 660 females in this stratum? More worrisome, why is the
age group 2 (20-29) still being used as reference when it should have
been dropped as a consequence of the "if" statement (i.e. _Iage_3 should
be the reference instead of _Iage_2)?

model 3 --> xi: logistic sex i.ageg if age<70
i.ageg            _Iageg_2-7          (naturally coded; _Iageg_2
omitted)

note: _Iageg_7 dropped because of collinearity

Logistic regression                               Number of obs   =
9352
                                                  LR chi2(4)      =
0.86
                                                  Prob > chi2     =
0.9303
Log likelihood = -6472.0855                       Pseudo R2       =
0.0001

------------------------------------------------------------------------
------
         sex | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
    _Iageg_3 |   .9761655   .0632659    -0.37   0.710     .8597191
1.108384
    _Iageg_4 |   .9971218   .0696639    -0.04   0.967     .8695188
1.143451
    _Iageg_5 |   .9424283    .065592    -0.85   0.394     .8222534
1.080167
    _Iageg_6 |   .9903392   .0554211    -0.17   0.862     .8874609
1.105144
------------------------------------------------------------------------
------

Now the "if" statement seems to work fine, as subjects with age>=70 are
excluded (i.e. the _Iage_7 group has been dropped!)

This also occurs if I run these models using STATA IC/9.2 or if one
models another dichotomous variable using a different dataset.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index