[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Differences in regression slopes

From	Maarten buis <[email protected]>
To	[email protected]
Subject	Re: st: Differences in regression slopes
Date	Wed, 20 Feb 2008 18:11:42 +0000 (GMT)

--- "E. Paul Wileyto" <[email protected]> wrote:
> Responses so far have sent you this way and that.  Just look up
> -test- in STATA help.
> 
> To get to the point of using -test- for your purpose, you would need
> to specify a model that has group-specific slopes, or combine two 
> regressions, one for each group, using -suest-.

This is still the conventional approach. The reason why the responses
have been so mixed is that there are real problems with it, as was
discussed in the handout sent earlier and in this working paper:
http://www.nd.edu/~rwilliam/oglm/RW_Hetero_Choice.pdf, but there hasn't
evovled a concensus yet on the appropriate solution. Rich's -oglm- 
seems promising, but is somewhat sensitve to model misspecification.
However, I expect that any model that tries to deal with this problem
will be at least as sensitve, and -oglm- has the advantage of being
implemented in Stata. 

Some time ago I sent some comments to Richard Williams on his handouts
for his talk at the 2007 West Coast Stata Users' Group Meeting talk on 
-oglm- (I whish I could bring up the discipline to write such
handouts...): http://ideas.repec.org/p/boc/wsug07/3.html . I copy the
relevent section below (heterogeneous choice model is -oglm-): 

The heterogeneous choice model seems to me a very fragile model: you
estimate a model for both the effect of the observed variables and the
errors, and you use your model for the errors to correct the effects of
the observed variables. Any fault in your model will mean the errors
are off, leading to faults in your model for those errors, which in
turn will feed back into the estimates of all other parameters.

The simulation below shows this: if the model is correct you will
reproduce the correct estimates. However, if you misspecify one of the
effects, all estimates are off, and are actually worse than a normal
logit.

Also, less spectacular but more practical since it involves real data
and real analysis, a lot the oomph in the analysis of Allison's
biochemist data seems to be due to a misspecification of the effect of
the number of articles (an economist wouldn't be surprised, and see
decreasing marginal returns). See the example below the simulation.

Do not mistake these comments to mean that I dislike your work, I like
it very much.

Best,
Maarten

*------------- begin simulation ----------------
set more off
set seed 1234

capture program drop sim
program define sim, rclass
	drop _all
	set obs 500
	gen x1 = invnorm(uniform())
	gen x2 = invnorm(uniform())
	gen x1sq = x1^2
	gen sigma = exp(x1)

	gen y = invlogit((-1 + x1 + x1sq + x2)/sigma) > uniform()
	oglm y x1 x2 x1sq, scale(x1)
	return scalar x1 = _b[x1]
	return scalar x2 = _b[x2]
	return scalar sx1 = [lnsigma]_b[x1]

	oglm y x1 x2, scale(x1)
	return scalar fx1 = _b[x1]
	return scalar fx2 = _b[x2]
	return scalar fsx1 = [lnsigma]_b[x1]

	logit y x1 x2
	return scalar lx1 = _b[x1]
	return scalar lx2 = _b[x2]
end

simulate x1=r(x1) x2=r(x2) sx1=r(sx1) /*
      */ fx1=r(fx1) fx2=r(fx2) fsx1=r(fsx1) /*
      */ lx1=r(lx1) lx2=r(lx2), reps(1000): sim
hist x1, name(x1, replace)
hist x2, name(x2, replace)
hist sx1, name(sx1, replace)

hist fx1, name(fx1, replace)
hist fx2, name(fx2, replace)
hist fsx1, name(fsx1, replace)

hist lx1, name(lx1, replace)
hist lx2, name(lx2, replace)
*---------------- end simulation -------------------

*------------- begin biochemist --------------------
use "http://www.indiana.edu/~jslsoc/stata/spex_data/tenure01.dta";,
clear
keep if pdasample

oglm tenure female year yearsq select       ///
            articles prestige ,             ///
            het(female) store(lin)

mkspline art=articles, cubic displ

oglm tenure female year yearsq select       ///
            art1-art4 prestige ,            ///
            het(female) store(art)

lrtest lin art, stats

gen lodds = _b[art1]*art1 + _b[art2]*art2 + ///
            _b[art3]*art3 + _b[art4]*art4

twoway line lodds articles, sort
*------------------ end biochemist -------------------

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------

      ___________________________________________________________
Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Differences in regression slopes
  - From: Richard Williams <[email protected]>
- RE: st: Differences in regression slopes
  - From: "Barth Riley" <[email protected]>

References:
- Re: st: Differences in regression slopes
  - From: "E. Paul Wileyto" <[email protected]>

Prev by Date: st: 2SLS with probit in the first stage
Next by Date: Re: st: RE: Outlier: Detection
Previous by thread: Re: st: Differences in regression slopes
Next by thread: RE: st: Differences in regression slopes
Index(es):
- Date
- Thread