Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Sparse Data Problem


From   john metcalfe <johnzmetcalfe@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Sparse Data Problem
Date   Sat, 7 Mar 2009 13:50:11 -0800

Thanks to Dave and Steve.
Dave, I am not sure how to apply -xtmelogit- to this data set, or if
this would be a correct thing to do. I haven't worked with this before
but will look into it.
Steve, thanks for your helpful comments. I am not quite sure what is
meant by the two part prediction equation. I think you mean getting
predicted probabilities from a logit model with s315t==1, but am not
sure about 's315t negative: predict clstr = 1'? Can you make this more
explicit?
Thanks much,
John

On Sat, Mar 7, 2009 at 11:43 AM, Steven Samuels
<sjhsamuels@earthlink.net> wrote:
>
> John, your model is probably incorrect. It assumes that, when s315t is 0,
>  the other factors make a difference implied by the model form.  They don't.
>  Correspondingly, the stratified two-way tables indicate  a possible
> interaction between s315t and 'east'.
>
> I suggest a two part prediction equation.
>
> s315t negative: predict clstr = 1
> s315t positive: predict with other factors in a logistic model.
>
>
> I'm not very familiar with exact logistic regression, but if the usual rules
> of thumb apply, the 32-33 events (clstr =1) entitle you to about three
> predictors altogether.
>
>
> -Steve
>
> On Mar 6, 2009, at 10:12 PM, john metcalfe wrote:
>
>> Dear Statalist,
>> I am analyzing a small data set with outcome of interest 'clstr', with
>> the primary goal of the analysis to determine if the variables 's315t'
>> and 'east' have independent associations with the outcome.  However,
>> 2315t is highly deterministic for the outcome clstr, as below. I am
>> concerned that exact logistic regression is not fully accounting for
>> the small cell bias. I would like to employ a hierarchical logistic
>> regression, but it seems that the stata command 'hireg' is only for
>> linear linear regressions??
>> It may be that I simply am unable to make any valid inferences with
>> this dataset, but I just want to make sure I have explored the
>> appropriate possible remedies.
>> Thanks,
>> John
>>
>> John Metcalfe, M.D., M.P.H.
>> University of California, San Francisco
>>
>>
>> . tab s315 clstr,e
>>
>>           |         clstr
>>     s315t |         0          1 |     Total
>> -----------+----------------------+----------
>>         0 |        22          1 |        23
>>         1 |        58         32 |        90
>> -----------+----------------------+----------
>>     Total |        80         33 |       113
>>
>>           Fisher's exact =                 0.002
>>   1-sided Fisher's exact =                 0.002
>>
>>
>>
>>
>> . logit clstr ageat s315t east emb sm num,or
>>
>> Iteration 0:   log likelihood = -62.686946
>> Iteration 1:   log likelihood = -51.860098
>> Iteration 2:   log likelihood = -50.754342
>> Iteration 3:   log likelihood = -50.661741
>> Iteration 4:   log likelihood = -50.660257
>> Iteration 5:   log likelihood = -50.660256
>>
>> Logistic regression                               Number of obs   =
>>  100
>>                                                  LR chi2(6)      =
>>  24.05
>>                                                  Prob > chi2     =
>> 0.0005
>> Log likelihood = -50.660256                       Pseudo R2       =
>> 0.1919
>>
>>
>> ------------------------------------------------------------------------------
>>       clstr | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
>> Interval]
>>
>> -------------+----------------------------------------------------------------
>>   ageatrept |   .9908837   .0139884    -0.65   0.517     .9638428
>>  1.018683
>>       s315t |   9.238959   10.28939     2.00   0.046     1.041462
>>  81.96011
>>  east_asian |   4.219755   2.215279     2.74   0.006     1.508083
>>  11.80727
>>         emb |   .9964845   .6599534    -0.01   0.996     .2721043
>>  3.649268
>>          sm |   2.138175   1.696319     0.96   0.338      .451589
>>  10.12379
>>  num_resist |   1.064089   .2385192     0.28   0.782     .6857694
>>  1.651116
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>> Strategy 1: Two-way contingency tables
>>
>> . tab clstr s315t if east==1,e
>>
>>           |         s315t
>>     clstr |         0          1 |     Total
>> -----------+----------------------+----------
>>         0 |         6         19 |        25
>>         1 |         1         24 |        25
>> -----------+----------------------+----------
>>     Total |         7         43 |        50
>>
>>           Fisher's exact =                 0.098
>>   1-sided Fisher's exact =                 0.049
>>
>> . tab clstr s315t if east==0,e
>>
>>           |         s315t
>>     clstr |         0          1 |     Total
>> -----------+----------------------+----------
>>         0 |        12         33 |        45
>>         1 |         0          8 |         8
>> -----------+----------------------+----------
>>     Total |        12         41 |        53
>>
>>           Fisher's exact =                 0.175
>>   1-sided Fisher's exact =                 0.108
>>
>>
>>
>> Strategy 2: Exact Logistic Regression
>>
>> observation 102: enumerations =       1128
>> observation 103: enumerations =        574
>>
>> Exact logistic regression                        Number of obs =       103
>>                                                 Model score   =  19.78112
>>                                                 Pr >= score   =    0.0000
>>
>> ---------------------------------------------------------------------------
>>       clstr | Odds Ratio       Suff.  2*Pr(Suff.)     [95% Conf. Interval]
>>
>> -------------+-------------------------------------------------------------
>>       s315t |   10.44218          32      0.0135      1.391627    474.4786
>>  east_asian |   5.414021          25      0.0006      1.933718    16.65417
>>
>>
>>
>>
>> (output omitted)
>> observation 103: enumerations =        574
>>
>> Exact logistic regression                        Number of obs =       103
>>                                                 Model score   =  19.78112
>>                                                 Pr >= score   =    0.0000
>>
>> ---------------------------------------------------------------------------
>>       clstr |      Coef.       Score    Pr>=Score     [95% Conf. Interval]
>>
>> -------------+-------------------------------------------------------------
>>       s315t |   2.345854    6.763266      0.0129      .3304732    6.162216
>>  east_asian |   1.688992    12.98631      0.0004      .6594448    2.812661
>>
>> ---------------------------------------------------------------------------
>>
>>
>> Strategy 3: Hierarchical Regression
>>
>> . hireg clstr (s315t) (east)(ageat emb sm)
>>
>> Model 1:
>>   Variables in Model:
>>   Adding            : s315t
>>
>>      Source |       SS       df       MS              Number of obs =
>> 113
>> -------------+------------------------------           F(  1,   111) =
>>  9.18
>>       Model |   1.7840879     1   1.7840879           Prob > F      =
>>  0.0030
>>    Residual |   21.578744   111  .194403099           R-squared     =
>>  0.0764
>> -------------+------------------------------           Adj R-squared =
>>  0.0680
>>       Total |  23.3628319   112  .208596713           Root MSE      =
>>  .44091
>>
>>
>> ------------------------------------------------------------------------------
>>       clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
>> Interval]
>>
>> -------------+----------------------------------------------------------------
>>       s315t |   .3120773   .1030162     3.03   0.003     .1079438
>>  .5162108
>>       _cons |   .0434783   .0919364     0.47   0.637    -.1386999
>>  .2256565
>>
>> ------------------------------------------------------------------------------
>>
>> Model 2:
>>   Variables in Model: s315t
>>   Adding            : east
>>
>>      Source |       SS       df       MS              Number of obs =
>> 103
>> -------------+------------------------------           F(  2,   100) =
>> 12.03
>>       Model |  4.34936038     2  2.17468019           Prob > F      =
>>  0.0000
>>    Residual |  18.0778241   100  .180778241           R-squared     =
>>  0.1939
>> -------------+------------------------------           Adj R-squared =
>>  0.1778
>>       Total |  22.4271845   102  .219874358           Root MSE      =
>>  .42518
>>
>>
>> ------------------------------------------------------------------------------
>>       clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
>> Interval]
>>
>> -------------+----------------------------------------------------------------
>>       s315t |   .2817301   .1086887     2.59   0.011     .0660947
>>  .4973654
>>  east_asian |   .3247109   .0843486     3.85   0.000     .1573656
>>  .4920561
>>       _cons |  -.0669987   .1023736    -0.65   0.514     -.270105
>>  .1361075
>>
>> ------------------------------------------------------------------------------
>> R-Square Diff. Model 2 - Model 1 = 0.118   F(1,100) = 14.190  p = 0.000
>>
>> Model 3:
>>   Variables in Model: s315t  east
>>   Adding            : ageat emb sm
>>
>>      Source |       SS       df       MS              Number of obs =
>> 100
>> -------------+------------------------------           F(  5,    94) =
>>  4.72
>>       Model |  4.36538233     5  .873076466           Prob > F      =
>>  0.0007
>>    Residual |  17.3946177    94  .185049124           R-squared     =
>>  0.2006
>> -------------+------------------------------           Adj R-squared =
>>  0.1581
>>       Total |       21.76    99   .21979798           Root MSE      =
>>  .43017
>>
>>
>> ------------------------------------------------------------------------------
>>       clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
>> Interval]
>>
>> -------------+----------------------------------------------------------------
>>       s315t |   .2335983   .1163422     2.01   0.048     .0025981
>>  .4645984
>>  east_asian |   .2694912   .0945411     2.85   0.005     .0817777
>>  .4572048
>>   ageatrept |  -.0012444   .0024199    -0.51   0.608    -.0060491
>>  .0035603
>>         emb |   .0396897   .0989203     0.40   0.689    -.1567189
>>  .2360984
>>          sm |   .1063985   .1087626     0.98   0.330    -.1095522
>>  .3223492
>>       _cons |  -.0454117   .1512602    -0.30   0.765    -.3457423
>> .254919
>>
>> ------------------------------------------------------------------------------
>> R-Square Diff. Model 3 - Model 2 = 0.007   F(3,94) =  0.029  p = 0.993
>>
>>
>> Model  R2      F(df)              p         R2 change  F(df) change
>> p
>>   1:  0.076   9.177(1,111)       0.003
>>   2:  0.194  12.030(2,100)       0.000     0.118     14.190(1,100)
>> 0.000
>>   3:  0.201   4.718(5,94)        0.001     0.007      0.029(3,94)
>>  0.993
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index