Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: reverse prediction - confidence interval for x at given y in nonlinear model


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   "Statalist" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: reverse prediction - confidence interval for x at given y in nonlinear model
Date   Mon, 29 Oct 2007 01:27:43 -0700

Maarten Buis wrote:

--- Joseph Coveney <jcoveney@bigplanet.com> wrote:
> If I'm not mistaken, the four-parameter logistic model Rosy used is
> for the *logarithm* of dose and *logarithm* of ED50, and not the dose
> and ED50, per se (cf. Maarten's y-axis values).

I am studying long term trends in inequality of educational opportunity
between children of different socioeconomic background, so I am not
very up to date on common parameterizations in biological and medical
studies...

I hope that you didn't misconstrue my reference to your graph as
disparagement, because nothing could be further from the truth:  I was
referring Rosy to the graph because its y-axis scale values nicely
illustrate what might not be obvious to a first-time user of -nl log4:-,
namely, that the built-in model expects predictor values on the order of
ln(100) and not on the order of 100.  Rosy's first post described things in
terms of "dose" and "ED50", and I just wanted to rule out this possibility
at the outset as a cause of Rosy's problem.

> Perhaps the parameterization is numerically stabler, too, in some
> sense.

When I estimated this model on my generated data I found that it wasn't
very stable. However it is not very surprising as it uses 4 parameters
to estimate a single curve and 2 of these parameters refer to
asymptotes, i.e. parts of the curve at the extreme left and the extreme
right where there are few observations.

I agree that four-parameter nonlinear models like these aren't particularly
easy to work with, and my comment wasn't intending claiming that they are.
Rather, I was speculating that the preference given the four-parameter
logistic parameterization is because it is stabler in some sense (more
tractably fit) than the analogous four-parameter sigmoid Emax
parameterization.  A limited simulation (below) seems to point in this
direction in that (i) with robotically chosen initial values, the sigmoid
Emax parameterization settles on silly estimates for two of thirty datasets
(Datasets 28 and 30) and the logistic in only one (Dataset 28), (ii) and
when the fit is difficult (as in Datasets 5, 11, 14, 28 and 30), the sigmoid
Emax parameterization requires twofold to fivefold the number of iterations
that the logistic does.  (It is interesting that, otherwise, the sigmoid
Emax parameterization nearly always requires fewer iterations--and never
more.)

As an aside, it's not uncommon for a designed dose-response study to have
deliberately numerous observations to the extreme left (often, zero dose,
itself), and at least up onto the Emax plateau.  The point of your graph is
well taken, however:  ED10s and ED90s are going to be estimated with less
confidence regardless of whether they're in a range where extrapolation is
necessary.

---
[snip]
Joseph asks:

> does the -nl log4:- four-parameter logistic model give rise to biased
> estimates of ED50 (ED10, ED90, etc.) and confidence intervals in the
> original measurement scale with nonasymptotic sample sizes?  That is,
> should a pharmacologist ever use -nl log4:- in lieu of the model
> shown just above?

The two models are equivalent. To see this notice first that they have
[formulae redacted for brevity]

I have no disagreement with your arithmetic, but there is an important
difference in the parameterizations, namely:

Step 2: Lets call ln(Dose) x, Hill b2, and ln(ED50) b3
I believe that the way numerical methods employed in nonlinear regression
software handle this difference has consequences to someone who's interested
in estimating ED50.  You can see what I mean in the simulation below.
Although the four-parameter logistic model gives more accurate estimation of
the Hill coefficient, it is poorer than the sigmoid Emax model in estimation
of the ED50 in the original measurement scale.  The sigmoid Emax
parameterization has a median bias (expressed in terms of the ratio) in ED50
of 0.99--near perfect--while the logistic's is only about two-thirds.  The
interquartile range is tighter with the sigmoid Emax parameterization, too,
despite the two misfits, and so an omnibus assessment giving weight to
efficiency as well as bias should favor that parameterization, as well,
especially if a manual effort is made to choose starting values in misfits
in order to assure a good fit.  (In the simulation, neither parameterization
very reliably estimates ED10 or ED90 with the datasets created in the
simulation.)

So, although I don't contend your characterization of the arithmetic
relation between the two parameterizations (the predictions are identical to
four decimal places in the 28 adequately fit models), I'm not sure that I
would really consider them equivalent in at least a couple of important
matters (bias in ED50, ability to naturally accommodate zero dose), and that
is what I was alluding to in my questions.

Joseph Coveney

clear
set more off
local mask = cond(c(stata_version) >= 10, "YMD", "ymd")
set seed `=date("2007-10-26", "`mask'")'
*
tempname memhold Hill log_ED50
tempname sEmx_Emin sEmx_Emax sEmx_ED50 sEmx_Hill sEmx_good
tempname sEmx_iterations Good
tempname log4_Emin log4_Emax log4_ED50 log4_Hill
tempvar Y X
tempfile results data
save `data', emptyok
*
set obs 13
generate byte dataset = .
generate double log_dose = -3 + (_n - 1) * 0.5
generate double dose = 10^log_dose
generate double response = .
local Emin 1
local Emax 11
*
postfile `memhold' byte dataset ///
 double (Hill ED50) ///
 double (sEmx_Emin sEmx_Emax sEmx_ED50 sEmx_Hill) ///
 int sEmx_iterations byte sEmx_good ///
 double (log4_Emin log4_Emax log4_ED50 log4_Hill) ///
 int log4_iterations byte log4_good ///
 using `results'
 forvalues i = 1/30 {
   quietly replace dataset = `i'
   scalar define `Hill' = 1 + 4 * uniform()
   scalar define `log_ED50' = invnormal(uniform())
   quietly replace response = `Emin' + `Emax' / (1 + ///
     exp(-`Hill' * (log_dose - `log_ED50'))) + ///
     invnormal(uniform()) / 2
* Initial values (From -nllog4.ado-)
   summarize response, meanonly
   local Emin_init = r(min) / 1.1
   local Emax_init = r(max) * 1.1
   quietly {
       generate double `Y' = ln((response - ///
         `Emin_init') / (`Emax_init' - response))
       generate double `X' = ln(10) * log_dose
       regress `Y' `X'
   }
   local Emax_init = `Emax_init' - `Emin_init'
   local Hill_init = _b[`X']
   local ln_ED50_init =  _b[_cons] / `Hill'
   local ED50_init =  exp(_b[_cons] / `Hill')
   drop `Y' `X'
*
   quietly nl (response = {Emin} + {Emax} * dose^{Hill} / ///
     (dose^{Hill} + {ED50}^{Hill})), ///
     initial(Emin `Emin_init' Emax `Emax_init' ///
     Hill `Hill_init' ED50 `ED50_init')
   scalar define `sEmx_Emin' = _b[/Emin]
   scalar define `sEmx_Emax' = _b[/Emax]
   scalar define `sEmx_ED50' = _b[/ED50]
   scalar define `sEmx_Hill' = _b[/Hill]
   matrix define `Good' = !matmissing(e(V))
   scalar define `sEmx_good' = `Good'[1,1]
   scalar define `sEmx_iterations' = e(ic)
   predict double response_sEmx, yhat
*
   quietly nl log4: response log_dose, ///
     initial(b0 `Emin_init' b1 `Emax_init' ///
     b2 `Hill_init' b3 `ln_ED50_init')
   matrix define `Good' = matmissing(e(V))
   post `memhold' (`i') (`Hill') (10^(`log_ED50')) ///
     (`sEmx_Emin') (`sEmx_Emax') (`sEmx_ED50') ///
     (`sEmx_Hill') (`sEmx_iterations') (`sEmx_good') ///
     (_b[/b0]) (_b[/b1]) (exp(_b[/b3])) (_b[/b2]) ///
     (e(ic)) (!`Good'[1,1])
   predict double response_log4, yhat
   append using `data'
   quietly {
       save `data', replace
       keep if dataset == `i'
       drop response_sEmx response_log4
   }
   display in smcl as result "`i'"
}
postclose `memhold'
use `results', clear
erase `results'
*
foreach centile in 10 90 {
   generate double ED`centile' = ///
     (`centile' / (100 - `centile'))^(1/Hill) * ED50
   generate double sEmx_ED`centile' = ///
     (`centile' / (100 - `centile'))^(1/sEmx_Hill) * sEmx_ED50
   generate double log4_ED`centile' = ///
     (`centile' / (100 - `centile'))^(1/log4_Hill) * log4_ED50
}
foreach parameter in ED10 ED50 ED90 {
   generate double sEmx_`parameter'_bias = ///
     sEmx_`parameter' / `parameter'
   generate double log4_`parameter'_bias = ///
     log4_`parameter' / `parameter'
}
*
format *ED?0 %6.2f
format *ED?0_bias %6.2f
format *Hill %4.2f
format sEmx_Emin sEmx_Emax log4_Emin log4_Emax %3.0f
log using log.smcl, smcl
list dataset ED50 sEmx_ED50 log4_ED50, ///
 noobs separator(0) abbreviate(15)
list dataset Hill sEmx_Hill log4_Hill, ///
 noobs separator(0) abbreviate(15)
list dataset sEmx_Emin log4_Emin sEmx_Emax log4_Emax, noobs ///
 separator(0) abbreviate(15)
list dataset sEmx_iterations log4_iterations sEmx_good log4_good, ///
 noobs separator(0) abbreviate(15)
*
list dataset ED10 sEmx_ED10 log4_ED10, ///
 noobs separator(0) abbreviate(15)
list dataset sEmx_ED10_bias log4_ED10_bias, ///
 noobs separator(0) abbreviate(15)
*
list dataset ED90 sEmx_ED90 log4_ED90, ///
 noobs separator(0) abbreviate(15)
list dataset sEmx_ED90_bias log4_ED90_bias, ///
 noobs separator(0) abbreviate(15)
centile *_bias, centile(25 50 75)
log close
*
save Results
*
use `data'
erase `data'
save Data
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index