st: RE: "if" statement

 From "Nick Cox" To Subject st: RE: "if" statement Date Wed, 16 Sep 2009 22:08:19 +0100

```First off, the word "statement" could be read variously, so I underline
there is no -if- command in view here. All the uses of -if- are of the
-if- qualifier. That would be pedantry except that many people confuse
the two.

Second, I think you are focusing on the wrong issue. The point is simply
that one indicator variable is redundant given the rest. How the
variables were created is immaterial, and indeed the code below, which
entails no use of -if- for creating your age indicators, shows that -if-
is itself not to blame for anything in this territory.

Third, I imagine that what you are seeing in the second case is that
-xi:- is inclined to drop _Iageg_2 and then -logistic- works out that
that inclination is irrelevant. I see nothing "worrisome" in that; it's
a natural consequence of Stata's division of labour here.

Nick
n.j.cox@durham.ac.uk

webuse nhanes2f, clear
gen ageg= floor(age/10)
replace sex=0 if sex==2
xi: logistic sex i.ageg

i.ageg            _Iageg_2-7          (naturally coded; _Iageg_2
omitted)

Logistic regression                               Number of obs   =
10337
LR chi2(5)      =
2.80
Prob > chi2     =
0.7302
Log likelihood =  -7150.626                       Pseudo R2       =
0.0002

------------------------------------------------------------------------
------
sex | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
_Iageg_3 |   .9761655   .0632659    -0.37   0.710     .8597191
1.108384
_Iageg_4 |   .9971218   .0696639    -0.04   0.967     .8695188
1.143451
_Iageg_5 |   .9424283    .065592    -0.85   0.394     .8222534
1.080167
_Iageg_6 |   .9903392   .0554211    -0.17   0.862     .8874609
1.105144
_Iageg_7 |   .8963705   .0683978    -1.43   0.152     .7718562
1.040971
------------------------------------------------------------------------
------

Nick
n.j.cox@durham.ac.uk

Victor Mauricio Herrera MD MS

I'd appreciate if somebody could explain the following behavior of the
"if" statement when used with "logistic" (I'm running STATA IC/10.1).

webuse nhanes2f
gen ageg=2 if age>=20 & age<30
replace ageg=3 if age>=30 & age<40
replace ageg=4 if age>=40 & age<50
replace ageg=5 if age>=50 & age<60
replace ageg=6 if age>=60 & age<70
replace ageg=7 if age>=70
replace sex=0 if sex==2

model 1 --> xi: logistic sex i.ageg
i.ageg            _Iageg_2-7          (naturally coded; _Iageg_2
omitted)

Logistic regression                               Number of obs   =
10337
LR chi2(5)      =
2.80
Prob > chi2     =
0.7302
Log likelihood =  -7150.626                       Pseudo R2       =
0.0002

------------------------------------------------------------------------
------
sex | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
_Iageg_3 |   .9761655   .0632659    -0.37   0.710     .8597191
1.108384
_Iageg_4 |   .9971218   .0696639    -0.04   0.967     .8695188
1.143451
_Iageg_5 |   .9424283    .065592    -0.85   0.394     .8222534
1.080167
_Iageg_6 |   .9903392   .0554211    -0.17   0.862     .8874609
1.105144
_Iageg_7 |   .8963705   .0683978    -1.43   0.152     .7718562
1.040971
------------------------------------------------------------------------
------

model 2 --> xi: logistic sex i.ageg if age>=30
i.ageg            _Iageg_2-7          (naturally coded; _Iageg_2
omitted)

note: _Iageg_4 dropped because of collinearity

Logistic regression                               Number of obs   =
8017
LR chi2(4)      =
2.35
Prob > chi2     =
0.6713
Log likelihood = -5544.1939                       Pseudo R2       =
0.0002

------------------------------------------------------------------------
------
sex | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
_Iageg_3 |   .9789833   .0734452    -0.28   0.777     .8451164
1.134055
_Iageg_5 |   .9451487   .0748512    -0.71   0.476     .8092619
1.103853
_Iageg_6 |   .9931979   .0670654    -0.10   0.919     .8700789
1.133739
_Iageg_7 |   .8989579   .0765455    -1.25   0.211     .7607821
1.06223
------------------------------------------------------------------------
------

Why is the age group 4 (40-49) dropped due to collinearity if there are
610 males and 660 females in this stratum? More worrisome, why is the
age group 2 (20-29) still being used as reference when it should have
been dropped as a consequence of the "if" statement (i.e. _Iage_3 should
be the reference instead of _Iage_2)?

model 3 --> xi: logistic sex i.ageg if age<70
i.ageg            _Iageg_2-7          (naturally coded; _Iageg_2
omitted)

note: _Iageg_7 dropped because of collinearity

Logistic regression                               Number of obs   =
9352
LR chi2(4)      =
0.86
Prob > chi2     =
0.9303
Log likelihood = -6472.0855                       Pseudo R2       =
0.0001

------------------------------------------------------------------------
------
sex | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
_Iageg_3 |   .9761655   .0632659    -0.37   0.710     .8597191
1.108384
_Iageg_4 |   .9971218   .0696639    -0.04   0.967     .8695188
1.143451
_Iageg_5 |   .9424283    .065592    -0.85   0.394     .8222534
1.080167
_Iageg_6 |   .9903392   .0554211    -0.17   0.862     .8874609
1.105144
------------------------------------------------------------------------
------

Now the "if" statement seems to work fine, as subjects with age>=70 are
excluded (i.e. the _Iage_7 group has been dropped!)

This also occurs if I run these models using STATA IC/9.2 or if one
models another dichotomous variable using a different dataset.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```