Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Sparse Data Problem


From   David Airey <david.airey@vanderbilt.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Sparse Data Problem
Date   Sat, 7 Mar 2009 10:36:55 -0600

.

I skimmed that paper. Doesn't the bias discussed come into play when using the -clogit- command? Doesn't the term "hierarchical regression" in the context of the Greeland paper refer to the used of mixed models for categorical data, like as in the use of -xtmixed- in Stata? This is very different from ssc -hireg- or Stata's -nestreg- commands.

On Mar 6, 2009, at 10:48 PM, john metcalfe wrote:

I was referring to Greenland Amer J Epi 2000.
Thanks for the tip.
John

On Fri, Mar 6, 2009 at 8:07 PM, David Airey <david.airey@vanderbilt.edu > wrote:
.

What do you mean when you said "not fully accounting for the small cell
bias"? I don't understand. I thought exact logistic models were for
situations with small cells. -nestreg- does nested estimations for logit models, though not exact logit models. It was added to Stata in June of
2008.

-Dave

On Mar 6, 2009, at 9:12 PM, john metcalfe wrote:

Dear Statalist,
I am analyzing a small data set with outcome of interest 'clstr', with the primary goal of the analysis to determine if the variables 's315t'
and 'east' have independent associations with the outcome.  However,
2315t is highly deterministic for the outcome clstr, as below. I am
concerned that exact logistic regression is not fully accounting for
the small cell bias. I would like to employ a hierarchical logistic
regression, but it seems that the stata command 'hireg' is only for
linear linear regressions??
It may be that I simply am unable to make any valid inferences with
this dataset, but I just want to make sure I have explored the
appropriate possible remedies.
Thanks,
John

John Metcalfe, M.D., M.P.H.
University of California, San Francisco


. tab s315 clstr,e

         |         clstr
   s315t |         0          1 |     Total
-----------+----------------------+----------
       0 |        22          1 |        23
       1 |        58         32 |        90
-----------+----------------------+----------
   Total |        80         33 |       113

         Fisher's exact =                 0.002
 1-sided Fisher's exact =                 0.002




. logit clstr ageat s315t east emb sm num,or

Iteration 0:   log likelihood = -62.686946
Iteration 1:   log likelihood = -51.860098
Iteration 2:   log likelihood = -50.754342
Iteration 3:   log likelihood = -50.661741
Iteration 4:   log likelihood = -50.660257
Iteration 5:   log likelihood = -50.660256

Logistic regression                               Number of obs   =
 100
                                                LR chi2(6)      =
 24.05
                                                Prob > chi2     =
0.0005
Log likelihood = -50.660256                       Pseudo R2       =
0.1919


------------------------------------------------------------------------------
     clstr | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]

------------- +----------------------------------------------------------------
 ageatrept |   .9908837   .0139884    -0.65   0.517     .9638428
 1.018683
     s315t |   9.238959   10.28939     2.00   0.046     1.041462
 81.96011
 east_asian |   4.219755   2.215279     2.74   0.006     1.508083
 11.80727
       emb |   .9964845   .6599534    -0.01   0.996     .2721043
 3.649268
        sm |   2.138175   1.696319     0.96   0.338      .451589
 10.12379
 num_resist |   1.064089   .2385192     0.28   0.782     .6857694
 1.651116

------------------------------------------------------------------------------



Strategy 1: Two-way contingency tables

. tab clstr s315t if east==1,e

         |         s315t
   clstr |         0          1 |     Total
-----------+----------------------+----------
       0 |         6         19 |        25
       1 |         1         24 |        25
-----------+----------------------+----------
   Total |         7         43 |        50

         Fisher's exact =                 0.098
 1-sided Fisher's exact =                 0.049

. tab clstr s315t if east==0,e

         |         s315t
   clstr |         0          1 |     Total
-----------+----------------------+----------
       0 |        12         33 |        45
       1 |         0          8 |         8
-----------+----------------------+----------
   Total |        12         41 |        53

         Fisher's exact =                 0.175
 1-sided Fisher's exact =                 0.108



Strategy 2: Exact Logistic Regression

observation 102: enumerations =       1128
observation 103: enumerations =        574

Exact logistic regression Number of obs = 103 Model score = 19.78112 Pr >= score = 0.0000

---------------------------------------------------------------------------
clstr | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval]

------------- +------------------------------------------------------------- s315t | 10.44218 32 0.0135 1.391627 474.4786 east_asian | 5.414021 25 0.0006 1.933718 16.65417




(output omitted)
observation 103: enumerations =        574

Exact logistic regression Number of obs = 103 Model score = 19.78112 Pr >= score = 0.0000

---------------------------------------------------------------------------
clstr | Coef. Score Pr>=Score [95% Conf. Interval]

------------- +------------------------------------------------------------- s315t | 2.345854 6.763266 0.0129 .3304732 6.162216 east_asian | 1.688992 12.98631 0.0004 .6594448 2.812661

---------------------------------------------------------------------------


Strategy 3: Hierarchical Regression

. hireg clstr (s315t) (east)(ageat emb sm)

Model 1:
 Variables in Model:
 Adding            : s315t

    Source |       SS       df       MS              Number of obs =
113
-------------+------------------------------ F( 1, 111) =
 9.18
     Model |   1.7840879     1   1.7840879           Prob > F      =
 0.0030
  Residual |   21.578744   111  .194403099           R-squared     =
 0.0764
-------------+------------------------------ Adj R- squared =
 0.0680
     Total |  23.3628319   112  .208596713           Root MSE      =
 .44091


------------------------------------------------------------------------------
     clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

------------- +----------------------------------------------------------------
     s315t |   .3120773   .1030162     3.03   0.003     .1079438
 .5162108
     _cons |   .0434783   .0919364     0.47   0.637    -.1386999
 .2256565

------------------------------------------------------------------------------

Model 2:
 Variables in Model: s315t
 Adding            : east

    Source |       SS       df       MS              Number of obs =
103
-------------+------------------------------ F( 2, 100) =
12.03
     Model |  4.34936038     2  2.17468019           Prob > F      =
 0.0000
  Residual |  18.0778241   100  .180778241           R-squared     =
 0.1939
-------------+------------------------------ Adj R- squared =
 0.1778
     Total |  22.4271845   102  .219874358           Root MSE      =
 .42518


------------------------------------------------------------------------------
     clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

------------- +----------------------------------------------------------------
     s315t |   .2817301   .1086887     2.59   0.011     .0660947
 .4973654
 east_asian |   .3247109   .0843486     3.85   0.000     .1573656
 .4920561
     _cons |  -.0669987   .1023736    -0.65   0.514     -.270105
 .1361075

------------------------------------------------------------------------------
R-Square Diff. Model 2 - Model 1 = 0.118 F(1,100) = 14.190 p = 0.000

Model 3:
 Variables in Model: s315t  east
 Adding            : ageat emb sm

    Source |       SS       df       MS              Number of obs =
100
-------------+------------------------------ F( 5, 94) =
 4.72
     Model |  4.36538233     5  .873076466           Prob > F      =
 0.0007
  Residual |  17.3946177    94  .185049124           R-squared     =
 0.2006
-------------+------------------------------ Adj R- squared =
 0.1581
     Total |       21.76    99   .21979798           Root MSE      =
 .43017


------------------------------------------------------------------------------
     clstr |      Coef.   Std. Err.      t    P>|t|     [95% Conf.
Interval]

------------- +----------------------------------------------------------------
     s315t |   .2335983   .1163422     2.01   0.048     .0025981
 .4645984
 east_asian |   .2694912   .0945411     2.85   0.005     .0817777
 .4572048
 ageatrept |  -.0012444   .0024199    -0.51   0.608    -.0060491
 .0035603
       emb |   .0396897   .0989203     0.40   0.689    -.1567189
 .2360984
        sm |   .1063985   .1087626     0.98   0.330    -.1095522
 .3223492
     _cons |  -.0454117   .1512602    -0.30   0.765    -.3457423
.254919

------------------------------------------------------------------------------
R-Square Diff. Model 3 - Model 2 = 0.007 F(3,94) = 0.029 p = 0.993


Model  R2      F(df)              p         R2 change  F(df) change
p
 1:  0.076   9.177(1,111)       0.003
 2:  0.194  12.030(2,100)       0.000     0.118     14.190(1,100)
0.000
 3:  0.201   4.718(5,94)        0.001     0.007      0.029(3,94)
 0.993
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index