Nonlinear estimation (STB-7: sg1.2) -------------------- ^nl^ fcn depvar [varlist] [weight] [^if^ exp] [^in^ range] [^,^ ^l^evel^(^#^) ^ ^i^nit^(^...^)^ ^ln^lsq^(^#^)^ ^lea^ve ^eps(^#^)^ ^nolo^g ^tr^ace ^it^erate^(^#^)^ fcn_options ] ^nlpred^ newvar [^if^ exp] [^in^ range] [^, r^esid] ^nlinit^ # parameter_list ^aweight^s and ^fweight^s are allowed. Description ----------- ^nl^ fits an arbitrary nonlinear function to the dependent variable depvar by least squares. You provide the function itself in a separate program with a name of your choosing, except that the first two letters of the name must be ^nl^. fcn refers to the name of the function without the first two letters. For example, you type "^nl nexpgr^ ..." to estimate with the function defined in the program ^nlnexpgr^. ^nl^ typed without arguments redisplays the results of the last ^nl^ estimation. ^nl^ shares most of the features of other estimation commands (see [4] estimate). ^predict^, however, may not be used after ^nl^ -- use ^nlpred^ instead. ^correlate,^ ^_coef^ may be used, but you may not use ^test^. ^nl^ should be viewed as work in progress: the fitting process is iterative (Gauss-Newton) and there can be convergence problems. Accurate initial parameter estimates are desirable. ^nlpred^ will calculate predicted values and residuals after ^nl^. ^nlinit^ is useful when writing nlfcns. Options ------- ^level(^#^)^ specifies the significance level, in percent, for confidence inter- vals of the coefficients; see [4] estimate. ^init(^...^)^ specifies initial values for parameters that are to be used to override the default initial values. Examples are provided below. ^lnlsq(^#^)^ fits the model defined in fcn by 'log least squares', defined as with shifted lognormal errors. In other words, ln(depvar-#) is assumed normally distributed. Sums of squares and deviance are adjusted to the same scale as depvar. ^leave^ leaves behind after estimation a set of new varibles with the same names as the estimated parameters containing the derivative of E(y) with respect to the parameter. ^eps(^#^)^ specifies the convergence criterion for successive parameter estimates and for the residual sum of squares. Default: 1e-5 (.00001). ^nolog^ suppresses the iteration log. Options, continued ------------------ ^trace^ expands the iteration log to provide more details, including values of the parameters at each step of the process. ^iterate(^#^)^ specifies the maximum number of iterations before giving up and defaults to 100. fcn_options refer to any options required allowed by nlfcn. ^resid^ tells ^nlpred^ to calculate residuals rather than predicted values. Remarks ------- ^nl^ fits an arbitrary nonlinear function to the dependent variable depvar by least squares. The specific function is specified by writing an nlfcn, des- cribed below. The values to be fitted in the fucntion are called the param- eters. Remarks, continued ------------------ The fitting process is iterative (modifed Gauss-Newton). It starts with a set of initial values for the parameters (guesses as to what the values will be and which you also supply) and finds another set of values that fit the function even better. Those are then used as a starting point and another improvement is found, and the process continues until no further improvement is possible. nlfcns ------ ^nl^ uses the function defined by nlfcn. nlfcn has two purposes: to identify the parameters of the problem and set default initial values, and to evaluate the function for a given set of parameter estimates. For instance, you have variables y and x in your data and wish to fit a negative-exponential growth curve with parameters B0 and B1: y = B0*( 1 - exp(-B1*x) ) nlfcns, continued ----------------- First, you write a program to calculate the predicted values: ^program define nlnexpgr^ ^if "`1'" == "?" {^ /* if query call ... */ ^mac def S_1 "B0 B1"^ /* identify parameters */ ^mac def B0=1^ /* and initialize */ ^mac def B1=.1^ /* them */ ^exit^ ^} ^replace `1'=$B0*(1-exp(-$B1*x))^ /* otherwise, calculate */ ^end^ /* function */ To estimate the model, you type: . ^nl nexpgr y^ ^nl^'s first argument specifies the name of the function, although you do not type the ^nl^ prefix. We type ^nexpgr^, meaning the function is ^nlnexpgr^. ^nl^'s second argument specifies the name of the dependent variable. nlfcns, continued ----------------- Notice that the initial values of the paramters were provided in the ^nlnexpgr^ program. (Press ^b^ to go back a screen.) You can, howver, override them. To estimate a model using .5 for the initial value of B0 rather than 1, you can ype: . ^nl nexpgr y, init(B0=.5)^ To also change the initial value of B1 from .1 to .2, you type: . ^nl nexpgr y, init(B0=.5, B1=.2)^ nlfcns, continued ----------------- The outline of all nlfcns is the same: ^program define nl^fcn ^if "`1'" == "?" {^ ^mac def S_1 "^...^"^ (identify parameters) (initialize parameters) ^exit^ } ^replace `1' =^ ... (calculate function) ^end^ On a query call, indicated by `1' being "?", the nlfcn is to place the names of the parameters in the global macro S_1 and initialize the parameters. Parameters are stored as macros, so if nlfcn declares that the parameters are A, B, and C (via ^mac def S_1 "A B C"^), it must then place initial values in the corresponding parameter macros A, B, and C (via ^mac def A=0^, ^mac def B=1^, etc.). After initializing the parameter macros, it is done. On a calculation call, `1' does not contain "?"; it instead contains the name nlfcns, continued ----------------- of a variable that is to be filled in with the predicted values. The current values of the parameters are stored in the macros previously declared on the query call (e.g., $A, $B, and $C). Example ------- You wish to fit the CES production functions defined by: lnq = B0 + A*ln(D*labor^^R + (1-D)*capital^^R) where the parameters to be estimated are B0, A, D, and R. ^labor^ and ^capital^ are variables in your data, as is ^lnq^. The nlfcn is: ^program define nlces^ ^if "`1'" == "?" {^ ^mac def S_1 "B0 A D R"^ ^mac def B0 = 1^ ^mac def A = -1^ ^mac def D = .5^ ^mac def R = -1^ ^exit^ ^}^ ^replace `1'=$B0 + $A*ln($D*labor^^$R + (1-$D)*capital^^R)^ ^end^ On could then estimate the model by typing: Example, continued ------------------ . ^use data^ . ^nl ces lnq^ Note that ^nlces^ provides initial values, called the default initial values. At estimation time, however, you can override the defaults by specifying the ^init()^ option, for example: . ^nl ces lnq, init(A=-.5)^ . ^nl ces lnq, init(A=-.5, D=.3, R=-2)^ Also see "More on nlfcns" below. After estimating with nl, typing . ^nl^ will redisplay the previous estimates. Example, continued ------------------ After estimating with ^nl^, typing . ^nlpred myvar^ would create myvar containing the predicted values. Typing . ^nlpred res, resid^ would create res containing the residuals. The ^nl^ command -------------- If the nonlinear model contains a constant term, ^nl^ will find it and indicate its presence by placing an asterisk (*) next to the parameter name when dis- playing results. ^nl^ determines that a parameter B is a constant term if the partial derivative f=dE(y)/dB has a coefficient of variation (s.d./mean) less than ^eps()^. Usually, f=1 for a constant. When making its calculations, ^nl^ creates the partial derivative variables for each of the parameters, giving each the same name as the corresponding param- eter. Unless you specify ^leave^, these are discarded when ^nl^ completes the estimation. Therefore, your data must not have data variables that have the same names as parameters. I recommend using uppercased names for parameters and lowercased names (as is common) for variables. ^nl^ output closely mimics ^regress^ output; see [5s] regress. The model F test, R-square, sums of squares, etc., are calculated as ^regress^ calculates them. If no constant is present, the usual caveats apply to the interpretation of the F and R-square statistics; see comments and references in Goldstein (1992). Log-normal errors ----------------- A nonlinear model with identically normally distributed errors may be written (1) y[i] = f(x[i], B) + u[i] , u[i] distributed N(0,s^^2) for i=1, ..., n. If the y[i] are thought to have a k-shifted lognormal in- stead of a normal distribution, that is, (2) ln(y[i] - k) distributed N(c[i], t^^2) and the systematic part f(x[i],B) of the original model is still thought ap- propriate, the model becomes: ln(y[i]-k) = c[i] + v[i] = ln{f(x[i],B)} + v[i], v[i] distr. N(0,t^^2) This model is estimated if ^lnlsq(^k^)^ is specified. If (2) is correct, the variance of (y[i]-k) is proportional to {f(x[i],B)-k}^^2. Probably the most common case is k=0, sometimes called `proportional errors' Log-normal errors, continued ---------------------------- since the standard error of y[i] is proportional to its expectation, f(x[i],B). Assuming the value of k is known, (2) is just another nonlinear model in B and it may be fitted as usual. However, we may wish to compare the fit of (1) with that of (2) using the residual sum of squares or the deviance D, D = -2 * ln(likelihood) from each model. To do so, we must allow for the change in scale introduced by the log transformation. Assuming, then, the y[i] to be normally distributed, Atkinson (1985, 85-87, 184), by considering the Jacobian, showed that multiplying both sides of (2) by the geometric mean of y, gm(y) = exp{ (ln(y[1]-k) + ... + ln(y[p]-k) ) / n } which is a constant for a given dataset. The residual deviance for (1) and (2) may be expressed as Log-normal errors, continued ---------------------------- (3) D(B*) = n{1 + ln(2*pi*s^^2)} where B* is the maximum-likelihood estimate (MLE) of B for each model and ns^^2 is the RSS from (1), or that from (2) multiplied by gm(y)^^2. Since (1) and (2) are models with different error structures but the same functional form, the arithmetic difference in their RSS or deviances is not easily tested for statistical significance. However, if the deviance dif- ference is `large' (>4, say) one would naturally prefer the model with the smaller deviance. Of course, the residuals for each model should be examined for departures from assumptions (nonconstant variance, non-normality, serial correlations, etc.) in the usual way. Example ------- Consider alternatively modeling (4) E(y[i]) = 1 / (C + A*exp(B*x[i])) (5) E(1/y[i]) = E(y'[i]) = C + A*exp(B*x[i]) where A and B are parameters to be estimated. We will use the data (y,x) = (.04,5), (.06,12), (.08,25), (.1,35), (.15,42), (.2,48), (.25,60), (.3,75), and (.5,120). Model C A B RSS Deviance ------------------------------------------------------------ (4) 1.781 25.74 -.03926 -.001640 -51.95 (4) with lnlsq(0) 1.799 25.45 -.04051 -.001431 -53.18 (5) 1.781 25.74 -.03926 8.197 24.70 (5) with lnlsq(0) 1.799 27.45 -.04051 3.651 17.42 There is little to choose between the two versions of the logistic (4) model, whereas for the exponential model the fit using lnlsq(0) is much better (a Example, continued ------------------ deviance difference of 7.28). The reciprocal transformation has introduced heteroscedasicity into y'[i] which is countered by the proportional errors property of the lognormal distribution implicit in ^lnlsq(0)^. The deviances are not comparable between the logistic and exponential models because the change of scale has not been allowed for, although in principle, it could be. Weights ------- Weights are specified the usual way; analytic and frequency weights are sup- ported; see [4] weights. Use of analytic weights implies that the y[i] have different variances. Model (1) may therefore be rewritten: (1a) y[i] = f(x[i],B) + u[i], u[i] dist. N(0,s^^2/w[i]) where w[i] are (positive) weights, assumed known and normalized such that their sum equals the number of observations. (The normalization is carried forth by nl; the weights you supply ned not be normalized.) The residual deviance for (1a) is: Weights, continued ------------------ (3a) D(B*) = n{1 + ln(2*pi*s^^2)} - ln(w[1]) - .. - ln(w[n]) (compare with equation 3), where ns^^2 = RSS = Sum{ w[i](y[i]-f(x[i],B))^^2 } Defining and fitting a model equivalent to (2) when weights have been speci- fied as in (1a) is not straightforward and has not been attempted. Thus, deviances using and not using the ^lnlsq()^ option may not be strictly com- parable when analytic weights (other than 0 and 1) are used. Errors ------ ^nl^ will stop with error 196 if an error occurs in your nlfcn program and it will report the error code raised by nlfcn. ^nl^ is reasonably robust to inability of nlfcn to calculate preddicted values for certain parameter values. ^nl^ assumes that predicted values can be cal- culated at the initial value of the parameters. If this is not so, an error message is issued with return code 480. Thereafter, as ^nl^ changes the parameter values, it monitors nlfcn's returned predictons for unexpected missing values. If detected, ^nl^ backs up. That is, ^nl^ finds a linear combination of the previous, known-to-be-good parameter vec- tor and the new, known-to- be-bad vector, where the function can be evaluated, and starts its iterations again. ^nl^ does require, however, that once a parameter vector is found where the predictions can be calculated, small changes to the parameter vector can be made in order to calculate numeric derivatives. If a boundry is encountered at this point, an error message is issued with return code 481. Errors, continued ----------------- When specifying ^lnlsq()^, an attempt to take logarithms of y[i]-k when y[i]<=k results in an error message with return code 482. If ^iterate()^ iterations are performed and estimates have still not converged, results are presented with a warning and the return code set to 430. General comments on fitting nonlinear models -------------------------------------------- In many cases, achieving convergence is problematic. For example, a unique maximume-likelihood (minimum-RSS) solution may not exist. A large literature exists on different algorithms that have been used, on strategies for obtaining good initial parameter values, and on tricks for parameterizing the model to make its behavior as `linear-like' as possible. Selected references are Kennedy and Gentle (1980, ch. 10) for computational matters, and Ross (1990) and Ratkowsky (1985) for all three aspects. Much of Ross's considerable experience is enshrined in the computer package MLP (Ross 1987), an invaluable resource. Ratkowsky's book is particularly clear and approachable, with useful General comments on fitting nonlinear models, continued ------------------------------------------------------- discussion on the meaning and practical implications of `intrinsic' and `param- eter-effects' nonlinearity. An excellent general text, though (in places) not for the mathematically faint-hearted, is Gallant (1987). The success of ^nl^ will be enhanced if care is paird to the form of the model fitted, along the lints of Ratkowsky and Ross. For example, Ratkowsky (1985, 49-59) anlyses three possible 3-parameter `yield-density' models for plant growth: E(y[i]) = (A + B*x[i])^^(-1/T) E(y[i]) = (A + B*x[i] + G*x[i]^^2)^^(-1) E(y[i]) = (A + B*x[i]^^P)^^(-1) All three models give similar fits. However, he shows that the second formu- lation is dramatically more `linear-like' than the other two and therefore has better convergence properties. In addition, the parameter estimates are virtually unbiased and normally distributed and the asymptotic approximation General comments on fitting nonlinear models, continued ------------------------------------------------------- to the standard errors, correlations and confidence intervals is much more accurate than for the other models. Even within a given model, the way the parameters are expressed (e.g. P^^x[i] or exp(T*x[i])) affects the degree of linear-like behaviour. My advice is that even if you cannot get a particular model to converge, don't give up; experiment with different ways of writing it or with slightly different alternative models that also fit well. More on nlfcns -------------- Note that the syntax for ^nl^ is ^nl^ fcn depvar [varlist] [...] [^,^ ... fcn_options] The syntax for an nlfcn is: nlfcn {varname | ^?^} [varlist] [^,^ fcn_options] The varlist, if specified with ^nl^, will be passed to nlfcn along with any options not intended for ^nl^. Thus, it is possible to write nlfcns that are quite general. When nlfcn is called with a ?, the varlist and options, if any, are still passed. In addition, $S_E_depv contains the identity of the dependent var- iable; $S_E_if and $S_E_in contain the ^if^ exp and ^in^ range specified on the ^nl^ command line; and $S_E_wgt and $S_E_exp contain the weight and expression. nlfcn is required to post the names of the parameters to S_1 and to provide default initial values for all the parameters. In addition, it may post up to two titles in S_2 and S_3 that will be subsequently used to title the out- More on nlfcns, continued ------------------------- put. The S_E_ macros are useful for filling in titles and for generating initial parameter estimates. When nlfcn is called without a ?, it is required to calculate the predicted values conditional on the current value of the parameters. Note that nlfcn is not required to process if exp or in range. Restriction to the estimation sample will be handled by ^nl^. Thus, at the beginning, we gave the following example for calculating an negative-exponential growth model: ^program define nlnexpgr^ ^if "`1'" == "?" {^ /* if query call ... */ ^mac def S_1 "B0 B1"^ /* identify parameters */ ^mac def B0=1^ /* and initialize */ ^mac def B1=.1^ /* them */ ^exit^ ^} ^replace `1'=$B0*(1-exp(-$B1*x))^ /* otherwise, calculate */ ^end^ /* function */ More on nlfcns, continued ------------------------- A better version would be ^program define nlnexpgr^ ^if "`1'" == "?" { ^ ^mac def S_1 "B0 B1"^ ^mac def B0=1^ ^mac def B1=.1^ ^mac def S_2 "negative-exp. growth"^ ^mac def S_3 "$S_E_depv = B0*(1-exp(-B1*`2'))"^ ^exit^ ^} ^replace `1'=$B0*(1-exp(-$B1*`2'))^ ^end^ This version would title the output and allow the independent variable to be specified on the ^nl^ command line: . ^nl nexpgr y xval^ More on nlfcns, continued ------------------------- An even more sophisticated version of ^nlexpgr^ might use S_E_depv, `2', S_E_if, and S_E_in to generate more reasonable starting values of B0 and B1. ^nlinit^ ------ ^nlinit^ is intended for use by nlfcns. Its syntax is: ^nlinit^ # parameter_list ^nlinit^ initializes each parameter in parameter list to contain #. For example: ^nlinit 0 A B C^ ^nlinit 1 D E^ Saved results ------------- ^nl^ saves in the the system macros $S_1, ..., $S_9: ^S_1^ number of observations ^S_2^ model sum of squares ^S_3^ model degrees of freedom ^S_4^ residual sum of squares ^S_5^ residual degrees of freedom ^S_6^ model F-statistic ^S_7^ R-square ^S_8^ adjusted R-square ^S_9^ residual root mean square ^S_10^ residual deviance (-2 * log likelihood) ^S_11^ [geometric mean (y-k)]^^2 if lnlsq(k), 1 otherwise ^S_12^ 0 if convergence failed, 1 otherwise Note that S_1 through S_9 correspond to the successive elements of _result() following ^regress^; see [5s] regress. The final parameter estimates are available in the parameter macros defined by nlfcn. The standard errors of the parameters are available through ^_se[^parameter^]^. ^nlpred^ and ^nlinit^ saving nothing in S_# macros or _result(). References ---------- Atkinson, A. C. 1985. ^Plots, Transformations and Regression^. Oxford: Oxford Science Publications. Danuso, F. 1991. sg1: Nonlinear regression command. ^Stata Technical Bulletin^ 1: 17-19. Gallant, A. R. 1987. ^Nonlinear Statistical Models.^ New York: John Wiley & Sons. Goldstein, R. 1992. srd7: Adjusted summary statistic for logarithmic regression. ^Stata Technical Bulletin^ 5: 17-21. Kennedy, W. J.Jr and J. E. Gentle. 1980. ^Statistical Computing^. New York: Marcel Dekker. Ratkowsky, D. A. 1983. ^Nonlinear Regression Modeling^. New York: Marcel Dekker. Ross, G. J. A. 1987. ^MLP User Manual, release 3.08^. Oxford: Numerical Algo- rithms Group. ----. 1990. ^Nonlinear Estimation^. New York: Springer-Verlag. Author ------ Patrick Royston, Royal Postgraduate Medical School, London. Also see -------- STB: sg1.2 (STB-7) Manual: [4] estimate On-line: ^help^ for ^regress^