Re: st: Sparse Data Problem

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: Sparse Data Problem Date Sat, 7 Mar 2009 14:43:48 -0500

```
```
John, your model is probably incorrect. It assumes that, when s315t is 0, the other factors make a difference implied by the model form. They don't. Correspondingly, the stratified two-way tables indicate a possible interaction between s315t and 'east'.
```
I suggest a two part prediction equation.

s315t negative: predict clstr = 1
s315t positive: predict with other factors in a logistic model.

```
I'm not very familiar with exact logistic regression, but if the usual rules of thumb apply, the 32-33 events (clstr =1) entitle you to about three predictors altogether.
```

-Steve

On Mar 6, 2009, at 10:12 PM, john metcalfe wrote:

```
```Dear Statalist,
I am analyzing a small data set with outcome of interest 'clstr', with
the primary goal of the analysis to determine if the variables 's315t'
and 'east' have independent associations with the outcome.  However,
2315t is highly deterministic for the outcome clstr, as below. I am
concerned that exact logistic regression is not fully accounting for
the small cell bias. I would like to employ a hierarchical logistic
regression, but it seems that the stata command 'hireg' is only for
linear linear regressions??
It may be that I simply am unable to make any valid inferences with
this dataset, but I just want to make sure I have explored the
appropriate possible remedies.
Thanks,
John

John Metcalfe, M.D., M.P.H.
University of California, San Francisco

. tab s315 clstr,e

|         clstr
s315t |         0          1 |     Total
-----------+----------------------+----------
0 |        22          1 |        23
1 |        58         32 |        90
-----------+----------------------+----------
Total |        80         33 |       113

Fisher's exact =                 0.002
1-sided Fisher's exact =                 0.002

. logit clstr ageat s315t east emb sm num,or

Iteration 0:   log likelihood = -62.686946
Iteration 1:   log likelihood = -51.860098
Iteration 2:   log likelihood = -50.754342
Iteration 3:   log likelihood = -50.661741
Iteration 4:   log likelihood = -50.660257
Iteration 5:   log likelihood = -50.660256

```
Logistic regression Number of obs = 100 LR chi2(6) = 24.05 Prob > chi2 = 0.0005 Log likelihood = -50.660256 Pseudo R2 = 0.1919
```
```
---------------------------------------------------------------------- -------- clstr | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------- ageatrept | .9908837 .0139884 -0.65 0.517 . 9638428 1.018683 s315t | 9.238959 10.28939 2.00 0.046 1.041462 81.96011 east_asian | 4.219755 2.215279 2.74 0.006 1.508083 11.80727 emb | .9964845 .6599534 -0.01 0.996 . 2721043 3.649268 sm | 2.138175 1.696319 0.96 0.338 . 451589 10.12379 num_resist | 1.064089 .2385192 0.28 0.782 . 6857694 1.651116 ---------------------------------------------------------------------- --------
```

Strategy 1: Two-way contingency tables

. tab clstr s315t if east==1,e

|         s315t
clstr |         0          1 |     Total
-----------+----------------------+----------
0 |         6         19 |        25
1 |         1         24 |        25
-----------+----------------------+----------
Total |         7         43 |        50

Fisher's exact =                 0.098
1-sided Fisher's exact =                 0.049

. tab clstr s315t if east==0,e

|         s315t
clstr |         0          1 |     Total
-----------+----------------------+----------
0 |        12         33 |        45
1 |         0          8 |         8
-----------+----------------------+----------
Total |        12         41 |        53

Fisher's exact =                 0.175
1-sided Fisher's exact =                 0.108

Strategy 2: Exact Logistic Regression

observation 102: enumerations =       1128
observation 103: enumerations =        574

```
Exact logistic regression Number of obs = 103 Model score = 19.78112 Pr >= score = 0.0000 ---------------------------------------------------------------------- ----- clstr | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval] ------------- +------------------------------------------------------------- s315t | 10.44218 32 0.0135 1.391627 474.4786 east_asian | 5.414021 25 0.0006 1.933718 16.65417
```

(output omitted)
observation 103: enumerations =        574

```
Exact logistic regression Number of obs = 103 Model score = 19.78112 Pr >= score = 0.0000 ---------------------------------------------------------------------- ----- clstr | Coef. Score Pr>=Score [95% Conf. Interval] ------------- +------------------------------------------------------------- s315t | 2.345854 6.763266 0.0129 .3304732 6.162216 east_asian | 1.688992 12.98631 0.0004 .6594448 2.812661 ---------------------------------------------------------------------- -----
```

Strategy 3: Hierarchical Regression

. hireg clstr (s315t) (east)(ageat emb sm)

Model 1:
Variables in Model:

```
Source | SS df MS Number of obs = 113 -------------+------------------------------ F( 1, 111) = 9.18 Model | 1.7840879 1 1.7840879 Prob > F = 0.0030 Residual | 21.578744 111 .194403099 R- squared = 0.0764 -------------+------------------------------ Adj R- squared = 0.0680 Total | 23.3628319 112 .208596713 Root MSE = .44091
```
```
---------------------------------------------------------------------- -------- clstr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- +---------------------------------------------------------------- s315t | .3120773 .1030162 3.03 0.003 . 1079438 .5162108 _cons | .0434783 .0919364 0.47 0.637 -. 1386999 .2256565 ---------------------------------------------------------------------- --------
```
Model 2:
Variables in Model: s315t

```
Source | SS df MS Number of obs = 103 -------------+------------------------------ F( 2, 100) = 12.03 Model | 4.34936038 2 2.17468019 Prob > F = 0.0000 Residual | 18.0778241 100 .180778241 R- squared = 0.1939 -------------+------------------------------ Adj R- squared = 0.1778 Total | 22.4271845 102 .219874358 Root MSE = .42518
```
```
---------------------------------------------------------------------- -------- clstr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- +---------------------------------------------------------------- s315t | .2817301 .1086887 2.59 0.011 . 0660947 .4973654 east_asian | .3247109 .0843486 3.85 0.000 . 1573656 .4920561 _cons | -.0669987 .1023736 -0.65 0.514 -. 270105 .1361075 ---------------------------------------------------------------------- -------- R-Square Diff. Model 2 - Model 1 = 0.118 F(1,100) = 14.190 p = 0.000
```
Model 3:
Variables in Model: s315t  east

```
Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 5, 94) = 4.72 Model | 4.36538233 5 .873076466 Prob > F = 0.0007 Residual | 17.3946177 94 .185049124 R- squared = 0.2006 -------------+------------------------------ Adj R- squared = 0.1581 Total | 21.76 99 .21979798 Root MSE = .43017
```
```
---------------------------------------------------------------------- -------- clstr | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------- +---------------------------------------------------------------- s315t | .2335983 .1163422 2.01 0.048 . 0025981 .4645984 east_asian | .2694912 .0945411 2.85 0.005 . 0817777 .4572048 ageatrept | -.0012444 .0024199 -0.51 0.608 -. 0060491 .0035603 emb | .0396897 .0989203 0.40 0.689 -. 1567189 .2360984 sm | .1063985 .1087626 0.98 0.330 -. 1095522 .3223492 _cons | -.0454117 .1512602 -0.30 0.765 -. 3457423 .254919 ---------------------------------------------------------------------- --------
```R-Square Diff. Model 3 - Model 2 = 0.007   F(3,94) =  0.029  p = 0.993

```
Model R2 F(df) p R2 change F(df) change p
```   1:  0.076   9.177(1,111)       0.003
```
2: 0.194 12.030(2,100) 0.000 0.118 14.190 (1,100) 0.000 3: 0.201 4.718(5,94) 0.001 0.007 0.029 (3,94) 0.993
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```