[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Dummy Variables vs. Subgroup Models in Logistic Regression

From	SamL <[email protected]>
To	[email protected]
Subject	RE: st: Dummy Variables vs. Subgroup Models in Logistic Regression
Date	Fri, 22 Oct 2004 08:57:59 -0700 (PDT)

Thanks a lot, Glenn, for muddying the waters.  No, really.  My
parenthetical example of the assumption of equal error structures was
meant to point to additional complexity.  I am glad you went further, in
your note.  It also seems you've gone further, with programming.  Is your
code available in some way other than the citation you provide (i.e., is
it online via stata?)?  And, what is the full cite to your 2004 paper?

Thanks a bunch.
Sam

On Fri, 22 Oct 2004, Hoetker, Glenn wrote:

> At 01:45 PM 10/22/2004 +0000, [email protected] wrote:
>
> >Dear Stata Users,
> >
> >      I'm creating a logistic regression model with many dichotomous
> > variables along with one term that has 8 categories coded 1,2,..8.  I
> can
> > create 7 dummy variables and have a very large model.  Would it be
> > legitimate if my sample sizes are large enough to create 8 separate
> > models with each model representing one subgroup?   Can anyone comment
> on
> > the pros and cons of using dummy variables versus creating separate
> > "subgroup" models based on the remaining independent variables?
> Thanks!
>
> Comparing logit/probit coefficients across groups is actually
> considerably more difficult than doing so in OLS.  This reflects the
> fact that the betas are not identified in a logit model without imposing
> a restriction by setting the variance of the error term to pi^2/3.  As a
> result, the estimated coefficients are the underlying "true" effect
> scaled by the amount of unobserved heterogeneity (a.k.a. residual
> variation).  If the unobserved heterogeneity varies across groups, as it
> often will, then the estimated betas will vary too, even if the "true"
> effect is the same.  Allison (1999) discusses this and proposes a test
> for detecting differences in unobserved heterogeneity and differences in
> underlying coefficients.  Other discussions of the scale issue include
> Maddala (1983:23), Long (1997:47), and Train (2004).
>
> Hoetker (2004) uses Monte Carlo simulations to show that (a) the problem
> Allison identified isn't just theoretical--it leads to misleading
> inferences in common situations and (b) Allison's tests are a
> significant improvement over current practice, but are not a panacea. It
> also offers some alternative analytical approaches, including code in
> Stata (of course) to implement them. One finding in particular is that
> the use of interaction terms to detect inter-group differences in logit
> equations if likely to yield misleading results if unobserved
> heterogeneity differs across groups.  In some circumstances, it's
> actually more likely to find significant results in the OPPOSITE
> direction than in the right direction.
>
> For cross-group comparisons in general, Liao (2002) is a helpful
> reference.
>
> Sorry to actually muddy the waters rather than providing a simple
> solution.  Best wishes.
>
> Glenn Hoetker
> Assistant Professor of Strategy
> College of Business
> University of Illinois at Urbana-Champaign
> 217-265-4081
> [email protected]
>
>
> Allison, P.D. 1999. Comparing logit and probit coefficients across
> groups. SMR/Sociological Methods & Research 28(2): 186-208.
>
> Hoetker, Glenn (2004). Confounded coefficients: Extending recent
> advances in the accurate comparison of logit and probit coefficients
> across groups. Working paper
> (http://www.business.uiuc.edu/ghoetker/wp.htm)
>
> Liao, T.F. 2002. Statistical group comparison. Wiley Series in
> Probability and Statistics. New York : Wiley-Interscience.
>
> Long, J.S. 1997. Regression models for categorical and limited dependent
> variables. Advanced Quantitative Techniques in the Social Sciences.
> Thousand Oaks, CA: Sage Publications.
>
> Maddala, G.S. 1983. Limited-dependent and qualitative variables in
> econometrics. New York: Cambridge University Press.
>
> Train, K.E. 2004. Discrete choice methods with simulation. Cambridge :
> Cambridge University Press.
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Richard
> Williams
> Sent: Friday, October 22, 2004 9:42 AM
> To: [email protected]; [email protected]
> Subject: Re: st: Dummy Variables vs. Subgroup Models in Logistic
> Regression
>
>
>
> If you estimate separate models, you are allowing ALL parameters to
> differ
> across groups, e.g. the effect of education could be different in each
> group.  If you just add dummies, you are allowing the intercept to
> differ
> in each group, but the effects of the other variables stay the same.
>
> If you estimate separate models for each group, your models will
> certainly
> be much less parsimonious, i.e. you'll have a lot more parameters
> floating
> around. But the real question is, what is most appropriate given your
> theory and the empirical reality?  If the effects of everything really
> is
> different across every group, then you should estimate separate
> models.  But, if the effects do not differ across groups, then you are
> producing unnecessarily complicated models, and you are also reducing
> your
> statistical power, e.g. by not pooling groups when you should be pooling
>
> them you'll be more likely to conclude that effects do not differ from
> zero
> when they really do.
>
> These sorts of issues are discussed in
>
> http://www.nd.edu/~rwilliam/stats2/l51.pdf
>
> http://www.nd.edu/~rwilliam/stats2/l92.pdf
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> FAX:    (574)288-4373
> HOME:   (574)289-5227
> EMAIL:  [email protected]
> WWW (personal):    http://www.nd.edu/~rwilliam
> WWW (department):    http://www.nd.edu/~soc
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- RE: st: Dummy Variables vs. Subgroup Models in Logistic Regression
  - From: "Hoetker, Glenn" <[email protected]>

Prev by Date: st: BiProb
Next by Date: Re: st: How to compute SE for linear or nonlinear combination of params?
Previous by thread: RE: st: Dummy Variables vs. Subgroup Models in Logistic Regression
Next by thread: RE: st: Dummy Variables vs. Subgroup Models in Logistic Regression
Index(es):
- Date
- Thread