[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# Re: st: Sparse Data Problem

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: Sparse Data Problem Date Sat, 7 Mar 2009 18:27:17 -0500

```
Here's the first table you presented:
clstr
```
```  s315t |         0          1 |     Total
-----------+----------------------+----------
0 |        22          1 |        23
1 |        58         32 |        90
-----------+----------------------+----------
Total |        80         33 |       113

```
```
```
You don't need (and won't be able to fit) a logistic model for the first row, but one might help for the second. Think of a classification and regression tree (CART) approach, where s315t= 0 defines a terminal node. By the way, missing values in the predictors are leading to differing n's in your results: 113, 103, 100.
```
-Steve

On Mar 7, 2009, at 4:50 PM, john metcalfe wrote:

```
```Thanks to Dave and Steve.
Dave, I am not sure how to apply -xtmelogit- to this data set, or if
this would be a correct thing to do. I haven't worked with this before
but will look into it.
Steve, thanks for your helpful comments. I am not quite sure what is
meant by the two part prediction equation. I think you mean getting
predicted probabilities from a logit model with s315t==1, but am not
sure about 's315t negative: predict clstr = 1'? Can you make this more
explicit?
Thanks much,
John

On Sat, Mar 7, 2009 at 11:43 AM, Steven Samuels
<sjhsamuels@earthlink.net> wrote:
```
```
```
John, your model is probably incorrect. It assumes that, when s315t is 0, the other factors make a difference implied by the model form. They don't.
``` Correspondingly, the stratified two-way tables indicate  a possible
interaction between s315t and 'east'.

I suggest a two part prediction equation.

s315t negative: predict clstr = 1
s315t positive: predict with other factors in a logistic model.

```
I'm not very familiar with exact logistic regression, but if the usual rules of thumb apply, the 32-33 events (clstr =1) entitle you to about three
```predictors altogether.

-Steve

On Mar 6, 2009, at 10:12 PM, john metcalfe wrote:

```
```Dear Statalist,
```
I am analyzing a small data set with outcome of interest 'clstr', with the primary goal of the analysis to determine if the variables 's315t'
```and 'east' have independent associations with the outcome.  However,
2315t is highly deterministic for the outcome clstr, as below. I am
concerned that exact logistic regression is not fully accounting for
the small cell bias. I would like to employ a hierarchical logistic
regression, but it seems that the stata command 'hireg' is only for
linear linear regressions??
It may be that I simply am unable to make any valid inferences with
this dataset, but I just want to make sure I have explored the
appropriate possible remedies.
Thanks,
John

John Metcalfe, M.D., M.P.H.
University of California, San Francisco

. tab s315 clstr,e

|         clstr
s315t |         0          1 |     Total
-----------+----------------------+----------
0 |        22          1 |        23
1 |        58         32 |        90
-----------+----------------------+----------
Total |        80         33 |       113

Fisher's exact =                 0.002
1-sided Fisher's exact =                 0.002

. logit clstr ageat s315t east emb sm num,or

Iteration 0:   log likelihood = -62.686946
Iteration 1:   log likelihood = -51.860098
Iteration 2:   log likelihood = -50.754342
Iteration 3:   log likelihood = -50.661741
Iteration 4:   log likelihood = -50.660257
Iteration 5:   log likelihood = -50.660256

Logistic regression                               Number of obs   =
100
LR chi2(6)      =
24.05
Prob > chi2     =
0.0005
Log likelihood = -50.660256                       Pseudo R2       =
0.1919

```
-------------------------------------------------------------------- ----------
```      clstr | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]

```
------------- +----------------------------------------------------------------
```  ageatrept |   .9908837   .0139884    -0.65   0.517     .9638428
1.018683
s315t |   9.238959   10.28939     2.00   0.046     1.041462
81.96011
east_asian |   4.219755   2.215279     2.74   0.006     1.508083
11.80727
emb |   .9964845   .6599534    -0.01   0.996     .2721043
3.649268
sm |   2.138175   1.696319     0.96   0.338      .451589
10.12379
num_resist |   1.064089   .2385192     0.28   0.782     .6857694
1.651116

```
-------------------------------------------------------------------- ----------
```

Strategy 1: Two-way contingency tables

. tab clstr s315t if east==1,e

|         s315t
clstr |         0          1 |     Total
-----------+----------------------+----------
0 |         6         19 |        25
1 |         1         24 |        25
-----------+----------------------+----------
Total |         7         43 |        50

Fisher's exact =                 0.098
1-sided Fisher's exact =                 0.049

. tab clstr s315t if east==0,e

|         s315t
clstr |         0          1 |     Total
-----------+----------------------+----------
0 |        12         33 |        45
1 |         0          8 |         8
-----------+----------------------+----------
Total |        12         41 |        53

Fisher's exact =                 0.175
1-sided Fisher's exact =                 0.108

Strategy 2: Exact Logistic Regression

observation 102: enumerations =       1128
observation 103: enumerations =        574

```
Exact logistic regression Number of obs = 103 Model score = 19.78112 Pr >= score = 0.0000
```
```
-------------------------------------------------------------------- ------- clstr | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval]
```
```
------------- +------------------------------------------------------------- s315t | 10.44218 32 0.0135 1.391627 474.4786 east_asian | 5.414021 25 0.0006 1.933718 16.65417
```

(output omitted)
observation 103: enumerations =        574

```
Exact logistic regression Number of obs = 103 Model score = 19.78112 Pr >= score = 0.0000
```
```
-------------------------------------------------------------------- ------- clstr | Coef. Score Pr>=Score [95% Conf. Interval]
```
```
------------- +------------------------------------------------------------- s315t | 2.345854 6.763266 0.0129 .3304732 6.162216 east_asian | 1.688992 12.98631 0.0004 .6594448 2.812661
```
```
-------------------------------------------------------------------- -------
```

Strategy 3: Hierarchical Regression

. hireg clstr (s315t) (east)(ageat emb sm)

Model 1:
Variables in Model:
Adding            : s315t

```
Source | SS df MS Number of obs =
```113
```
-------------+------------------------------ F( 1, 111) =
``` 9.18
```
Model | 1.7840879 1 1.7840879 Prob > F =
``` 0.0030
```
Residual | 21.578744 111 .194403099 R- squared =
``` 0.0764
```
-------------+------------------------------ Adj R- squared =
``` 0.0680
```
Total | 23.3628319 112 .208596713 Root MSE =
``` .44091

```
-------------------------------------------------------------------- ----------
```      clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

```
------------- +----------------------------------------------------------------
```      s315t |   .3120773   .1030162     3.03   0.003     .1079438
.5162108
_cons |   .0434783   .0919364     0.47   0.637    -.1386999
.2256565

```
-------------------------------------------------------------------- ----------
```
Model 2:
Variables in Model: s315t
Adding            : east

```
Source | SS df MS Number of obs =
```103
```
-------------+------------------------------ F( 2, 100) =
```12.03
```
Model | 4.34936038 2 2.17468019 Prob > F =
``` 0.0000
```
Residual | 18.0778241 100 .180778241 R- squared =
``` 0.1939
```
-------------+------------------------------ Adj R- squared =
``` 0.1778
```
Total | 22.4271845 102 .219874358 Root MSE =
``` .42518

```
-------------------------------------------------------------------- ----------
```      clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

```
------------- +----------------------------------------------------------------
```      s315t |   .2817301   .1086887     2.59   0.011     .0660947
.4973654
east_asian |   .3247109   .0843486     3.85   0.000     .1573656
.4920561
_cons |  -.0669987   .1023736    -0.65   0.514     -.270105
.1361075

```
-------------------------------------------------------------------- ---------- R-Square Diff. Model 2 - Model 1 = 0.118 F(1,100) = 14.190 p = 0.000
```
Model 3:
Variables in Model: s315t  east
Adding            : ageat emb sm

```
Source | SS df MS Number of obs =
```100
```
-------------+------------------------------ F( 5, 94) =
``` 4.72
```
Model | 4.36538233 5 .873076466 Prob > F =
``` 0.0007
```
Residual | 17.3946177 94 .185049124 R- squared =
``` 0.2006
```
-------------+------------------------------ Adj R- squared =
``` 0.1581
```
Total | 21.76 99 .21979798 Root MSE =
``` .43017

```
-------------------------------------------------------------------- ----------
```      clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

```
------------- +----------------------------------------------------------------
```      s315t |   .2335983   .1163422     2.01   0.048     .0025981
.4645984
east_asian |   .2694912   .0945411     2.85   0.005     .0817777
.4572048
ageatrept |  -.0012444   .0024199    -0.51   0.608    -.0060491
.0035603
emb |   .0396897   .0989203     0.40   0.689    -.1567189
.2360984
sm |   .1063985   .1087626     0.98   0.330    -.1095522
.3223492
_cons |  -.0454117   .1512602    -0.30   0.765    -.3457423
.254919

```
-------------------------------------------------------------------- ---------- R-Square Diff. Model 3 - Model 2 = 0.007 F(3,94) = 0.029 p = 0.993
```

Model  R2      F(df)              p         R2 change  F(df) change
p
1:  0.076   9.177(1,111)       0.003
2:  0.194  12.030(2,100)       0.000     0.118     14.190(1,100)
0.000
3:  0.201   4.718(5,94)        0.001     0.007      0.029(3,94)
0.993
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

 © Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index