[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Joseph Coveney" <[email protected]> |

To |
"Statalist" <[email protected]> |

Subject |
Re: st: reverse prediction - confidence interval for x at given y in nonlinear model |

Date |
Mon, 29 Oct 2007 01:27:43 -0700 |

Maarten Buis wrote: --- Joseph Coveney <[email protected]> wrote:

> If I'm not mistaken, the four-parameter logistic model Rosy used is > for the *logarithm* of dose and *logarithm* of ED50, and not the dose > and ED50, per se (cf. Maarten's y-axis values). I am studying long term trends in inequality of educational opportunity between children of different socioeconomic background, so I am not very up to date on common parameterizations in biological and medical studies...

I hope that you didn't misconstrue my reference to your graph as disparagement, because nothing could be further from the truth: I was referring Rosy to the graph because its y-axis scale values nicely illustrate what might not be obvious to a first-time user of -nl log4:-, namely, that the built-in model expects predictor values on the order of ln(100) and not on the order of 100. Rosy's first post described things in terms of "dose" and "ED50", and I just wanted to rule out this possibility at the outset as a cause of Rosy's problem.

> Perhaps the parameterization is numerically stabler, too, in some > sense. When I estimated this model on my generated data I found that it wasn't very stable. However it is not very surprising as it uses 4 parameters to estimate a single curve and 2 of these parameters refer to asymptotes, i.e. parts of the curve at the extreme left and the extreme right where there are few observations.

I agree that four-parameter nonlinear models like these aren't particularly easy to work with, and my comment wasn't intending claiming that they are. Rather, I was speculating that the preference given the four-parameter logistic parameterization is because it is stabler in some sense (more tractably fit) than the analogous four-parameter sigmoid Emax parameterization. A limited simulation (below) seems to point in this direction in that (i) with robotically chosen initial values, the sigmoid Emax parameterization settles on silly estimates for two of thirty datasets (Datasets 28 and 30) and the logistic in only one (Dataset 28), (ii) and when the fit is difficult (as in Datasets 5, 11, 14, 28 and 30), the sigmoid Emax parameterization requires twofold to fivefold the number of iterations that the logistic does. (It is interesting that, otherwise, the sigmoid Emax parameterization nearly always requires fewer iterations--and never more.) As an aside, it's not uncommon for a designed dose-response study to have deliberately numerous observations to the extreme left (often, zero dose, itself), and at least up onto the Emax plateau. The point of your graph is well taken, however: ED10s and ED90s are going to be estimated with less confidence regardless of whether they're in a range where extrapolation is necessary.

---

[snip]

Joseph asks: > does the -nl log4:- four-parameter logistic model give rise to biased > estimates of ED50 (ED10, ED90, etc.) and confidence intervals in the > original measurement scale with nonasymptotic sample sizes? That is, > should a pharmacologist ever use -nl log4:- in lieu of the model > shown just above? The two models are equivalent. To see this notice first that they have

[formulae redacted for brevity] I have no disagreement with your arithmetic, but there is an important difference in the parameterizations, namely:

Step 2: Lets call ln(Dose) x, Hill b2, and ln(ED50) b3

I believe that the way numerical methods employed in nonlinear regression software handle this difference has consequences to someone who's interested in estimating ED50. You can see what I mean in the simulation below. Although the four-parameter logistic model gives more accurate estimation of the Hill coefficient, it is poorer than the sigmoid Emax model in estimation of the ED50 in the original measurement scale. The sigmoid Emax parameterization has a median bias (expressed in terms of the ratio) in ED50 of 0.99--near perfect--while the logistic's is only about two-thirds. The interquartile range is tighter with the sigmoid Emax parameterization, too, despite the two misfits, and so an omnibus assessment giving weight to efficiency as well as bias should favor that parameterization, as well, especially if a manual effort is made to choose starting values in misfits in order to assure a good fit. (In the simulation, neither parameterization very reliably estimates ED10 or ED90 with the datasets created in the simulation.) So, although I don't contend your characterization of the arithmetic relation between the two parameterizations (the predictions are identical to four decimal places in the 28 adequately fit models), I'm not sure that I would really consider them equivalent in at least a couple of important matters (bias in ED50, ability to naturally accommodate zero dose), and that is what I was alluding to in my questions. Joseph Coveney clear set more off local mask = cond(c(stata_version) >= 10, "YMD", "ymd") set seed `=date("2007-10-26", "`mask'")' * tempname memhold Hill log_ED50 tempname sEmx_Emin sEmx_Emax sEmx_ED50 sEmx_Hill sEmx_good tempname sEmx_iterations Good tempname log4_Emin log4_Emax log4_ED50 log4_Hill tempvar Y X tempfile results data save `data', emptyok * set obs 13 generate byte dataset = . generate double log_dose = -3 + (_n - 1) * 0.5 generate double dose = 10^log_dose generate double response = . local Emin 1 local Emax 11 * postfile `memhold' byte dataset /// double (Hill ED50) /// double (sEmx_Emin sEmx_Emax sEmx_ED50 sEmx_Hill) /// int sEmx_iterations byte sEmx_good /// double (log4_Emin log4_Emax log4_ED50 log4_Hill) /// int log4_iterations byte log4_good /// using `results' forvalues i = 1/30 { quietly replace dataset = `i' scalar define `Hill' = 1 + 4 * uniform() scalar define `log_ED50' = invnormal(uniform()) quietly replace response = `Emin' + `Emax' / (1 + /// exp(-`Hill' * (log_dose - `log_ED50'))) + /// invnormal(uniform()) / 2 * Initial values (From -nllog4.ado-) summarize response, meanonly local Emin_init = r(min) / 1.1 local Emax_init = r(max) * 1.1 quietly { generate double `Y' = ln((response - /// `Emin_init') / (`Emax_init' - response)) generate double `X' = ln(10) * log_dose regress `Y' `X' } local Emax_init = `Emax_init' - `Emin_init' local Hill_init = _b[`X'] local ln_ED50_init = _b[_cons] / `Hill' local ED50_init = exp(_b[_cons] / `Hill') drop `Y' `X' * quietly nl (response = {Emin} + {Emax} * dose^{Hill} / /// (dose^{Hill} + {ED50}^{Hill})), /// initial(Emin `Emin_init' Emax `Emax_init' /// Hill `Hill_init' ED50 `ED50_init') scalar define `sEmx_Emin' = _b[/Emin] scalar define `sEmx_Emax' = _b[/Emax] scalar define `sEmx_ED50' = _b[/ED50] scalar define `sEmx_Hill' = _b[/Hill] matrix define `Good' = !matmissing(e(V)) scalar define `sEmx_good' = `Good'[1,1] scalar define `sEmx_iterations' = e(ic) predict double response_sEmx, yhat * quietly nl log4: response log_dose, /// initial(b0 `Emin_init' b1 `Emax_init' /// b2 `Hill_init' b3 `ln_ED50_init') matrix define `Good' = matmissing(e(V)) post `memhold' (`i') (`Hill') (10^(`log_ED50')) /// (`sEmx_Emin') (`sEmx_Emax') (`sEmx_ED50') /// (`sEmx_Hill') (`sEmx_iterations') (`sEmx_good') /// (_b[/b0]) (_b[/b1]) (exp(_b[/b3])) (_b[/b2]) /// (e(ic)) (!`Good'[1,1]) predict double response_log4, yhat append using `data' quietly { save `data', replace keep if dataset == `i' drop response_sEmx response_log4 } display in smcl as result "`i'" } postclose `memhold' use `results', clear erase `results' * foreach centile in 10 90 { generate double ED`centile' = /// (`centile' / (100 - `centile'))^(1/Hill) * ED50 generate double sEmx_ED`centile' = /// (`centile' / (100 - `centile'))^(1/sEmx_Hill) * sEmx_ED50 generate double log4_ED`centile' = /// (`centile' / (100 - `centile'))^(1/log4_Hill) * log4_ED50 } foreach parameter in ED10 ED50 ED90 { generate double sEmx_`parameter'_bias = /// sEmx_`parameter' / `parameter' generate double log4_`parameter'_bias = /// log4_`parameter' / `parameter' } * format *ED?0 %6.2f format *ED?0_bias %6.2f format *Hill %4.2f format sEmx_Emin sEmx_Emax log4_Emin log4_Emax %3.0f log using log.smcl, smcl list dataset ED50 sEmx_ED50 log4_ED50, /// noobs separator(0) abbreviate(15) list dataset Hill sEmx_Hill log4_Hill, /// noobs separator(0) abbreviate(15) list dataset sEmx_Emin log4_Emin sEmx_Emax log4_Emax, noobs /// separator(0) abbreviate(15) list dataset sEmx_iterations log4_iterations sEmx_good log4_good, /// noobs separator(0) abbreviate(15) * list dataset ED10 sEmx_ED10 log4_ED10, /// noobs separator(0) abbreviate(15) list dataset sEmx_ED10_bias log4_ED10_bias, /// noobs separator(0) abbreviate(15) * list dataset ED90 sEmx_ED90 log4_ED90, /// noobs separator(0) abbreviate(15) list dataset sEmx_ED90_bias log4_ED90_bias, /// noobs separator(0) abbreviate(15) centile *_bias, centile(25 50 75) log close * save Results * use `data' erase `data' save Data exit * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: frailty in parametric models***From:*James Cui <[email protected]>

- Prev by Date:
**Re: st: Date: Sun, 28 Oct 2007 16:42:05 +0100** - Next by Date:
**st: How to output results of a program in single-spaced text** - Previous by thread:
**RE: st: reverse prediction - confidence interval for x at given y in nonlinear model** - Next by thread:
**st: frailty in parametric models** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |