Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Knot optimized logistic regression


From   Dan MacNulty <[email protected]>
To   [email protected]
Subject   Re: st: Knot optimized logistic regression
Date   Fri, 29 Jan 2010 08:55:32 -0800

Maarten,
Thanks very much for sharing your code. There is one added twist: my data are clustered within subject, so I'm estimating a random-intercept logistic spline regression. I assume the addition of a random effect will make knot estimation even more difficult, correct?

Dan

Maarten buis wrote:
--- On Fri, 29/1/10, Roger Harbord wrote:
Surely that depends on the size of the dataset too? If
Dan's outcome variable has thousands of successes and
thousands of failures then i'd have thought there might
be enough information to estimate the position of a
single knot in a linear spline, if not in a cubic
spline. Agree there's no built-in command to fit this (nor
user-written one AFAIK) so it would take a bit of
programming in -ml- (though you could call -mkspline- and
-logit- to do most of the work).

Some models are identified in theory but demand so much of the data that in practice they will virtually always fail to converge. I would classify the logistic with an unknown knot location in this category.
Anyhow, the programming is not such a big deal. Below is the
the likelihood if you want to give it a try:

program define splogit_lf
	version 8.2
	args lnf xb b1 b2 k
	tempvar eta
	qui gen double `eta' = `xb' + `b1'*$sp + `b2'*max($sp-`k', 0)
	qui replace `lnf' = ln(invlogit(`eta')) if $ML_y1 == 1
	qui replace `lnf' = ln(invlogit(-`eta')) if $ML_y1 == 0
end

Here `xb' captures the control variables, `b1' the effect of the first spline term and `b2' the effect of the second spline term, `k' is the location of the knot. The variable that you want to turn into a spline has to be stored before the program in the global sp.

Alternatively, you could do a grid search, like in the example
below. You can see why this is such a hard problem, the likelihood function appears to be bi-model and there seems
to be a considerable platteau near the global maximum.

*-------------- begin example ----------------------
sysuse nlsw88, clear

sum wage, detail
local begin = r(p5)
local end = r(p95)
local step = (`end' - `begin')/101
tempvar ll k
gen double `ll' = .
gen double `k' = .
local j = 1
forvalues i = `begin'(`step')`end' {
	capture drop sp*
	mkspline sp1 `i' sp2 = wage
	logit union south sp*
	replace `ll' = e(ll) in `j'
	replace `k' = `i' in `j++'
}

twoway line `ll' `k'
*-------------------- end example ------------------

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index