[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Sparse Data Problem |

Date |
Sun, 8 Mar 2009 10:54:48 -0400 |

As Richard Williams said in a related post today:

Steps:

1. Create z1 = z*x1 z2 = z*x2 2. Run your model as: logistic y z z1 z2 Note what this will do: When z = 0: y = constant When z = 1 y = constant + _b[z] + _b[z1]*x1 + _b[z2]*x2 Interpretation: constant = log odds of outcome when z = 0. All hypotheses about x1 and x2 apply only when z = 1

_b[z] = log odds ratio for event for z= 1 vs z = 0. -Steve On Mar 7, 2009, at 6:27 PM, Steven Samuels wrote:

Here's the first table you presented: clstrs315t | 0 1 | Total -----------+----------------------+---------- 0 | 22 1 | 23 1 | 58 32 | 90 -----------+----------------------+---------- Total | 80 33 | 113You don't need (and won't be able to fit) a logistic model for thefirst row, but one might help for the second. Think of aclassification and regression tree (CART) approach, where s315t= 0defines a terminal node. By the way, missing values in thepredictors are leading to differing n's in your results: 113, 103,100.-Steve On Mar 7, 2009, at 4:50 PM, john metcalfe wrote:Thanks to Dave and Steve. Dave, I am not sure how to apply -xtmelogit- to this data set, or ifthis would be a correct thing to do. I haven't worked with thisbeforebut will look into it. Steve, thanks for your helpful comments. I am not quite sure what is meant by the two part prediction equation. I think you mean getting predicted probabilities from a logit model with s315t==1, but am notsure about 's315t negative: predict clstr = 1'? Can you make thismoreexplicit? Thanks much, John On Sat, Mar 7, 2009 at 11:43 AM, Steven Samuels <sjhsamuels@earthlink.net> wrote:John, your model is probably incorrect. It assumes that, whens315t is 0,the other factors make a difference implied by the model form.They don't.Correspondingly, the stratified two-way tables indicate a possible interaction between s315t and 'east'. I suggest a two part prediction equation. s315t negative: predict clstr = 1 s315t positive: predict with other factors in a logistic model.I'm not very familiar with exact logistic regression, but if theusual rulesof thumb apply, the 32-33 events (clstr =1) entitle you to aboutthreepredictors altogether. -Steve On Mar 6, 2009, at 10:12 PM, john metcalfe wrote:Dear Statalist,I am analyzing a small data set with outcome of interest'clstr', withthe primary goal of the analysis to determine if the variables's315t'and 'east' have independent associations with the outcome.However,2315t is highly deterministic for the outcome clstr, as below. I amconcerned that exact logistic regression is not fully accountingforthe small cell bias. I would like to employ a hierarchical logistic regression, but it seems that the stata command 'hireg' is only for linear linear regressions?? It may be that I simply am unable to make any valid inferences with this dataset, but I just want to make sure I have explored the appropriate possible remedies. Thanks, John John Metcalfe, M.D., M.P.H. University of California, San Francisco . tab s315 clstr,e | clstr s315t | 0 1 | Total -----------+----------------------+---------- 0 | 22 1 | 23 1 | 58 32 | 90 -----------+----------------------+---------- Total | 80 33 | 113 Fisher's exact = 0.002 1-sided Fisher's exact = 0.002 . logit clstr ageat s315t east emb sm num,or Iteration 0: log likelihood = -62.686946 Iteration 1: log likelihood = -51.860098 Iteration 2: log likelihood = -50.754342 Iteration 3: log likelihood = -50.661741 Iteration 4: log likelihood = -50.660257 Iteration 5: log likelihood = -50.660256 Logistic regression Number of obs = 100 LR chi2(6) = 24.05 Prob > chi2 = 0.0005 Log likelihood = -50.660256 Pseudo R2 = 0.1919------------------------------------------------------------------------------clstr | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------ageatrept | .9908837 .0139884 -0.65 0.517 .9638428 1.018683 s315t | 9.238959 10.28939 2.00 0.046 1.041462 81.96011 east_asian | 4.219755 2.215279 2.74 0.006 1.508083 11.80727 emb | .9964845 .6599534 -0.01 0.996 .2721043 3.649268 sm | 2.138175 1.696319 0.96 0.338 .451589 10.12379 num_resist | 1.064089 .2385192 0.28 0.782 .6857694 1.651116------------------------------------------------------------------------------Strategy 1: Two-way contingency tables . tab clstr s315t if east==1,e | s315t clstr | 0 1 | Total -----------+----------------------+---------- 0 | 6 19 | 25 1 | 1 24 | 25 -----------+----------------------+---------- Total | 7 43 | 50 Fisher's exact = 0.098 1-sided Fisher's exact = 0.049 . tab clstr s315t if east==0,e | s315t clstr | 0 1 | Total -----------+----------------------+---------- 0 | 12 33 | 45 1 | 0 8 | 8 -----------+----------------------+---------- Total | 12 41 | 53 Fisher's exact = 0.175 1-sided Fisher's exact = 0.108 Strategy 2: Exact Logistic Regression observation 102: enumerations = 1128 observation 103: enumerations = 574Exact logistic regression Number of obs= 103Model score =19.78112Pr >= score= 0.0000---------------------------------------------------------------------------clstr | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf.Interval]-------------+-------------------------------------------------------------s315t | 10.44218 32 0.01351.391627 474.4786east_asian | 5.414021 25 0.00061.933718 16.65417(output omitted) observation 103: enumerations = 574Exact logistic regression Number of obs= 103Model score =19.78112Pr >= score= 0.0000---------------------------------------------------------------------------clstr | Coef. Score Pr>=Score [95% Conf.Interval]-------------+-------------------------------------------------------------s315t | 2.345854 6.763266 0.0129 .3304732 6.162216east_asian | 1.688992 12.98631 0.0004 .6594448 2.812661---------------------------------------------------------------------------Strategy 3: Hierarchical Regression . hireg clstr (s315t) (east)(ageat emb sm) Model 1: Variables in Model: Adding : s315tSource | SS df MS Number ofobs =113-------------+------------------------------ F( 1,111) =9.18Model | 1.7840879 1 1.7840879 Prob >F =0.0030Residual | 21.578744 111 .194403099 R-squared =0.0764-------------+------------------------------ Adj R-squared =0.0680Total | 23.3628319 112 .208596713 RootMSE =.44091------------------------------------------------------------------------------clstr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------s315t | .3120773 .1030162 3.03 0.003 .1079438 .5162108 _cons | .0434783 .0919364 0.47 0.637 -.1386999 .2256565------------------------------------------------------------------------------Model 2: Variables in Model: s315t Adding : eastSource | SS df MS Number ofobs =103-------------+------------------------------ F( 2,100) =12.03Model | 4.34936038 2 2.17468019 Prob >F =0.0000Residual | 18.0778241 100 .180778241 R-squared =0.1939-------------+------------------------------ Adj R-squared =0.1778Total | 22.4271845 102 .219874358 RootMSE =.42518------------------------------------------------------------------------------clstr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------s315t | .2817301 .1086887 2.59 0.011 .0660947 .4973654 east_asian | .3247109 .0843486 3.85 0.000 .1573656 .4920561 _cons | -.0669987 .1023736 -0.65 0.514 -.270105 .1361075------------------------------------------------------------------------------R-Square Diff. Model 2 - Model 1 = 0.118 F(1,100) = 14.190 p= 0.000Model 3: Variables in Model: s315t east Adding : ageat emb smSource | SS df MS Number ofobs =100-------------+------------------------------ F( 5,94) =4.72Model | 4.36538233 5 .873076466 Prob >F =0.0007Residual | 17.3946177 94 .185049124 R-squared =0.2006-------------+------------------------------ Adj R-squared =0.1581Total | 21.76 99 .21979798 RootMSE =.43017------------------------------------------------------------------------------clstr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------s315t | .2335983 .1163422 2.01 0.048 .0025981 .4645984 east_asian | .2694912 .0945411 2.85 0.005 .0817777 .4572048 ageatrept | -.0012444 .0024199 -0.51 0.608 -.0060491 .0035603 emb | .0396897 .0989203 0.40 0.689 -.1567189 .2360984 sm | .1063985 .1087626 0.98 0.330 -.1095522 .3223492 _cons | -.0454117 .1512602 -0.30 0.765 -.3457423 .254919------------------------------------------------------------------------------R-Square Diff. Model 3 - Model 2 = 0.007 F(3,94) = 0.029 p =0.993Model R2 F(df) p R2 change F(df) change p 1: 0.076 9.177(1,111) 0.003 2: 0.194 12.030(2,100) 0.000 0.118 14.190(1,100) 0.000 3: 0.201 4.718(5,94) 0.001 0.007 0.029(3,94) 0.993 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Sparse Data Problem***From:*john metcalfe <johnzmetcalfe@gmail.com>

**Re: st: Sparse Data Problem***From:*Steven Samuels <sjhsamuels@earthlink.net>

**Re: st: Sparse Data Problem***From:*john metcalfe <johnzmetcalfe@gmail.com>

**Re: st: Sparse Data Problem***From:*Steven Samuels <sjhsamuels@earthlink.net>

- Prev by Date:
**AW: st: Logistic Regression_Unequal Ns (outcomes)** - Next by Date:
**AW: st: Weibull** - Previous by thread:
**Re: st: Sparse Data Problem** - Next by thread:
**st: Tabout** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |