# Re: st: Sparse Data Problem

 From David Airey To statalist@hsphsun2.harvard.edu Subject Re: st: Sparse Data Problem Date Sat, 7 Mar 2009 10:40:25 -0600

```.

sorry...I meant -xtmelogit- not -xtmixed- in the last email

On Mar 6, 2009, at 10:48 PM, john metcalfe wrote:

```
```I was referring to Greenland Amer J Epi 2000.
Thanks for the tip.
John

```
On Fri, Mar 6, 2009 at 8:07 PM, David Airey <david.airey@vanderbilt.edu > wrote:
```.

```
What do you mean when you said "not fully accounting for the small cell
```bias"? I don't understand. I thought exact logistic models were for
```
situations with small cells. -nestreg- does nested estimations for logit models, though not exact logit models. It was added to Stata in June of
```2008.

-Dave

On Mar 6, 2009, at 9:12 PM, john metcalfe wrote:

```
```Dear Statalist,
```
I am analyzing a small data set with outcome of interest 'clstr', with the primary goal of the analysis to determine if the variables 's315t'
```and 'east' have independent associations with the outcome.  However,
2315t is highly deterministic for the outcome clstr, as below. I am
concerned that exact logistic regression is not fully accounting for
the small cell bias. I would like to employ a hierarchical logistic
regression, but it seems that the stata command 'hireg' is only for
linear linear regressions??
It may be that I simply am unable to make any valid inferences with
this dataset, but I just want to make sure I have explored the
appropriate possible remedies.
Thanks,
John

John Metcalfe, M.D., M.P.H.
University of California, San Francisco

. tab s315 clstr,e

|         clstr
s315t |         0          1 |     Total
-----------+----------------------+----------
0 |        22          1 |        23
1 |        58         32 |        90
-----------+----------------------+----------
Total |        80         33 |       113

Fisher's exact =                 0.002
1-sided Fisher's exact =                 0.002

. logit clstr ageat s315t east emb sm num,or

Iteration 0:   log likelihood = -62.686946
Iteration 1:   log likelihood = -51.860098
Iteration 2:   log likelihood = -50.754342
Iteration 3:   log likelihood = -50.661741
Iteration 4:   log likelihood = -50.660257
Iteration 5:   log likelihood = -50.660256

Logistic regression                               Number of obs   =
100
LR chi2(6)      =
24.05
Prob > chi2     =
0.0005
Log likelihood = -50.660256                       Pseudo R2       =
0.1919

------------------------------------------------------------------------------
clstr | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]

```
------------- +----------------------------------------------------------------
```ageatrept |   .9908837   .0139884    -0.65   0.517     .9638428
1.018683
s315t |   9.238959   10.28939     2.00   0.046     1.041462
81.96011
east_asian |   4.219755   2.215279     2.74   0.006     1.508083
11.80727
emb |   .9964845   .6599534    -0.01   0.996     .2721043
3.649268
sm |   2.138175   1.696319     0.96   0.338      .451589
10.12379
num_resist |   1.064089   .2385192     0.28   0.782     .6857694
1.651116

------------------------------------------------------------------------------

Strategy 1: Two-way contingency tables

. tab clstr s315t if east==1,e

|         s315t
clstr |         0          1 |     Total
-----------+----------------------+----------
0 |         6         19 |        25
1 |         1         24 |        25
-----------+----------------------+----------
Total |         7         43 |        50

Fisher's exact =                 0.098
1-sided Fisher's exact =                 0.049

. tab clstr s315t if east==0,e

|         s315t
clstr |         0          1 |     Total
-----------+----------------------+----------
0 |        12         33 |        45
1 |         0          8 |         8
-----------+----------------------+----------
Total |        12         41 |        53

Fisher's exact =                 0.175
1-sided Fisher's exact =                 0.108

Strategy 2: Exact Logistic Regression

observation 102: enumerations =       1128
observation 103: enumerations =        574

```
Exact logistic regression Number of obs = 103 Model score = 19.78112 Pr >= score = 0.0000
```
---------------------------------------------------------------------------
```
clstr | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval]
```
```
------------- +------------------------------------------------------------- s315t | 10.44218 32 0.0135 1.391627 474.4786 east_asian | 5.414021 25 0.0006 1.933718 16.65417
```

(output omitted)
observation 103: enumerations =        574

```
Exact logistic regression Number of obs = 103 Model score = 19.78112 Pr >= score = 0.0000
```
---------------------------------------------------------------------------
```
clstr | Coef. Score Pr>=Score [95% Conf. Interval]
```
```
------------- +------------------------------------------------------------- s315t | 2.345854 6.763266 0.0129 .3304732 6.162216 east_asian | 1.688992 12.98631 0.0004 .6594448 2.812661
```
---------------------------------------------------------------------------

Strategy 3: Hierarchical Regression

. hireg clstr (s315t) (east)(ageat emb sm)

Model 1:
Variables in Model:

Source |       SS       df       MS              Number of obs =
113
```
-------------+------------------------------ F( 1, 111) =
```9.18
Model |   1.7840879     1   1.7840879           Prob > F      =
0.0030
Residual |   21.578744   111  .194403099           R-squared     =
0.0764
```
```0.0680
Total |  23.3628319   112  .208596713           Root MSE      =
.44091

------------------------------------------------------------------------------
clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

```
------------- +----------------------------------------------------------------
```    s315t |   .3120773   .1030162     3.03   0.003     .1079438
.5162108
_cons |   .0434783   .0919364     0.47   0.637    -.1386999
.2256565

------------------------------------------------------------------------------

Model 2:
Variables in Model: s315t

Source |       SS       df       MS              Number of obs =
103
```
-------------+------------------------------ F( 2, 100) =
```12.03
Model |  4.34936038     2  2.17468019           Prob > F      =
0.0000
Residual |  18.0778241   100  .180778241           R-squared     =
0.1939
```
```0.1778
Total |  22.4271845   102  .219874358           Root MSE      =
.42518

------------------------------------------------------------------------------
clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

```
------------- +----------------------------------------------------------------
```    s315t |   .2817301   .1086887     2.59   0.011     .0660947
.4973654
east_asian |   .3247109   .0843486     3.85   0.000     .1573656
.4920561
_cons |  -.0669987   .1023736    -0.65   0.514     -.270105
.1361075

------------------------------------------------------------------------------
```
R-Square Diff. Model 2 - Model 1 = 0.118 F(1,100) = 14.190 p = 0.000
```
Model 3:
Variables in Model: s315t  east

Source |       SS       df       MS              Number of obs =
100
```
-------------+------------------------------ F( 5, 94) =
```4.72
Model |  4.36538233     5  .873076466           Prob > F      =
0.0007
Residual |  17.3946177    94  .185049124           R-squared     =
0.2006
```
```0.1581
Total |       21.76    99   .21979798           Root MSE      =
.43017

------------------------------------------------------------------------------
clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

```
------------- +----------------------------------------------------------------
```    s315t |   .2335983   .1163422     2.01   0.048     .0025981
.4645984
east_asian |   .2694912   .0945411     2.85   0.005     .0817777
.4572048
ageatrept |  -.0012444   .0024199    -0.51   0.608    -.0060491
.0035603
emb |   .0396897   .0989203     0.40   0.689    -.1567189
.2360984
sm |   .1063985   .1087626     0.98   0.330    -.1095522
.3223492
_cons |  -.0454117   .1512602    -0.30   0.765    -.3457423
.254919

------------------------------------------------------------------------------
```
R-Square Diff. Model 3 - Model 2 = 0.007 F(3,94) = 0.029 p = 0.993
```

Model  R2      F(df)              p         R2 change  F(df) change
p
1:  0.076   9.177(1,111)       0.003
2:  0.194  12.030(2,100)       0.000     0.118     14.190(1,100)
0.000
3:  0.201   4.718(5,94)        0.001     0.007      0.029(3,94)
0.993
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```