How do I fit a regression with interval constraints in Stata?
|
Title
|
|
Fitting a regression with interval constraints
|
|
Author
|
Isabel Cañette, StataCorp
|
|
Date
|
October 2006; updated July 2011; minor revisions October 2011
|
If you are interested in fitting a linear regression with a linear
constraint, you can use the command
cnsreg. However,
sometimes you may need to fit a model for which one of the
coefficients should be positive or between zero and one. You can fit such a model
by reparameterizing the coefficients. This tutorial
explains the steps to fit this model by using the Stata command
ml. For simplicity, I explain how
to set interval constraints on a linear regression. However, this method could
be applied to other kinds of regression (e.g., probit, logit, Poisson, tobit),
provided that they can be fitted by maximum likelihood. As stated above,
if you want to set nonlinear constraints on a linear regression, you can use
nl, which would not require any programming.
This FAQ is organized as follows:
- Introduction
- Prerequisites
- Our introductory example
- Application: Setting a model where parameters are proportions
- Final remarks
1. Introduction
I will start by presenting an example on how to use
ml to fit a linear regression. The code in this
example can be modified to set interval constraints. I will explain how to
perform this modification by showing the steps to restrict one coefficient
to be positive.
Finally, I will present an example where the coefficients are restricted to
be proportions; that is, they are constrained to be positive numbers that
add up to one.
If you know (from the nature of the problem) that the constraints you
want to set are true, the following method will allow you to set these
restrictions in the model. However, if the data do not support the
validity of these constraints, the algorithm for the restricted model
probably won’t achieve convergence.
There are essentially two situations where we may want to set interval
constraints. One case is when the numerical global maximum of the likelihood
actually is where we want it to be, but we want to help the algorithm avoid
difficult spots somewhere else. This case is similar to the situation of
estimating the standard deviation in a linear regression, where the maximum
is positive, but we set the restriction to help the algorithm achieve
convergence. The other situation is when we know that the true parameter is
in a certain region, but for numerical reasons the global maximum of the
likelihood may be somewhere else. We are interested not in finding a global
maximum but in finding a maximum in a certain subset. This is the case of
the example of proportions, considered in section 5.
If you also need to set linear constraints, you can use the option
constraint of the command
ml.
2. Prerequisites
This tutorial assumes that you are familiar with the command
ml. I will review some characteristics of the
estimation by using ml here. However, if you have
never used the Stata command ml, see [R]
ml.
Technical notes explain some details on the output; you can skip them in a
first reading.
3. Our introductory example
Let’s assume that we want to reproduce the following estimation,
. sysuse auto, clear
. regress price mpg displacement
but we want to make sure that the coefficient of variable
mpg will be negative.
Because we are going to use method d0,
we will borrow the program mynormal_d0.ado from the book
Maximum
Likelihood Estimation with Stata, by Gould, Pitblado, and
Sribney (you can download this file from the book’s website).
Although the syntax for method lf is simpler, if we adapt an
lf program to fit a model with interval constraints, the good
properties of lf will be lost (see technical note below); some
preliminary simulations showed that, when adapted to our problem, method
d0 is better for convergence.
|
Technical note: Whenever the likelihood meets the linear form,
method lf is efficient. With lf, the
derivatives are computed at the equation level, and the chain rule is applied to
obtain the derivatives with respect to the parameters. This method produces
high-quality derivatives and therefore accurate estimators. When we
modify a maximum likelihood program to meet interval restrictions according
to the procedure explained below, the equations are disassembled; then, even
when using an lf program, we will force
ml to compute numerical derivatives directly with
respect to the coefficients, losing the numerical benefits of
lf.
|
3.1. Using method d0 to fit a linear regression
Let’s start by fitting the model without restrictions, using
regress.
. sysuse auto, clear
(1978 Automobile Data)
. regress price mpg displacement
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 2, 71) = 13.35
Model | 173587098 2 86793549.2 Prob > F = 0.0000
Residual | 461478298 71 6499694.33 R-squared = 0.2733
-------------+------------------------------ Adj R-squared = 0.2529
Total | 635065396 73 8699525.97 Root MSE = 2549.4
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -121.1833 72.78844 -1.66 0.100 -266.3193 23.95276
displacement | 10.50885 4.58548 2.29 0.025 1.365658 19.65203
_cons | 6672.766 2299.72 2.90 0.005 2087.254 11258.28
------------------------------------------------------------------------------
Now let’s fit the same model by using ml:
/* begin do file */
cscript
program mynormal_d0
version 12.0
args todo b lnf
tempname lnsigma sigma
tempvar mu
mleval `mu' = `b', eq(1)
mleval `lnsigma' = `b', eq(2) scalar
quietly {
scalar `sigma' = exp(`lnsigma')
mlsum `lnf' = ln( normalden($ML_y1,`mu',`sigma') )
}
end
sysuse auto
ml model d0 mynormal_d0 (xb: price = mpg displacement) (lnsigma:), ///
diparm(lnsigma, exp label(sigma))
ml maximize
/* end do file */
Here are the results:
Number of obs = 74
Wald chi2(2) = 27.84
Log likelihood = -683.89903 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
price | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb |
mpg | -121.1833 71.29774 -1.70 0.089 -260.9243 18.55774
displacement | 10.50885 4.49157 2.34 0.019 1.70553 19.31216
_cons | 6672.766 2252.622 2.96 0.003 2257.707 11087.82
-------------+----------------------------------------------------------------
lnsigma |
_cons | 7.82294 .0821995 95.17 0.000 7.661832 7.984048
-------------+----------------------------------------------------------------
sigma | 2497.237 205.2716 2125.649 2933.784
------------------------------------------------------------------------------
|
Technical note: You will see some differences between the output of
ml and the output of
regress. First, the standard errors
are different because a different number of degrees of freedom was used to
compute the variance of the errors. The variance for the errors in
regress and ml is estimated,
respectively, as v1 and
v2, where
v1 = e'e/(N-k)
v2 = e'e/(N)
Here e is the vector of residuals,
N is the number of observations, and
k is the number of regressors, including
the constant.
Second, the confidence intervals reported by
regress are based on the t distribution, and
ml uses the Gaussian distribution to compute them.
Third, regress performs an F test, and
ml performs a
Wald test. See the following FAQ to learn how these tests relate to each
other: Why
does test sometimes produce chi-squared and other times F statistics?
|
Let’s have a closer look at the code for ml; the line
ml model d0 mynormal_d0 (xb: price = mpg displacement) (lnsigma:)
introduces two equations. The first equation,
(xb: price = mpg displacement)
indicates the following:
The second equation,
(lnsigma:)
indicates that
- the name of this equation is lnsigma, and
- there are not independent variables in this equation.
Because this is a constant-only equation, there is no need to generate a
whole variable for it in the evaluation. Evaluating it as scalar is more
efficient, as in
mleval `lnsigma' = `b', eq(2) scalar
This is essentially the same as
scalar `lnsigma' = `b'[1,4]
This parameter is used in the line
scalar double `sigma' = exp(`lnsigma')
We are not estimating the standard deviation (sigma), but its
logarithm (lnsigma). This trick allows us to express the standard
deviation as an exponent (exp(`lnsigma')) and guarantees that
sigma will be positive. We make sure that the algorithm will not
search for sigma in the negative subspace. We want to use this trick
to apply an inequality constraint to a coefficient.
3.2. Setting separate equations for the coefficients
In this step, we set up a separate equation for the coefficient that we will
restrict.
In the program mynormal_d0, the coefficient of mpg is included
in equation 1. To set a restriction on it, we need to extract
this coefficient from the equation. More precisely, we need to include this
coefficient as a scalar parameter in the model, that is
ml model d0 mynormal_b (a: ) (xb: price = displacement ) (lnsigma:)
where the constant-only equation (a: ) corresponds to the coefficient
of mpg. The question of how to include the variable itself remains;
as we cannot introduce this variable in any of the equations, we will need
to use a global macro. Look at the following code:
/* begin do file */
cscript
program mynormal_b
version 12.0
args todo b lnf
tempname a lnsigma sigma
tempvar xb mu
mleval `a' = `b', eq(1) scalar
mleval `xb' = `b', eq(2)
mleval `lnsigma' = `b', eq(3) scalar
quietly {
generate double `mu' = `xb' +`a'*$x2
scalar `sigma' = exp(`lnsigma')
mlsum `lnf' = ln( normalden($ML_y1,`mu',`sigma') )
}
end
sysuse auto
global x2 mpg
ml model d0 mynormal_b (a: ) (xb: price = displacement ) (lnsigma:), ///
diparm(lnsigma, exp label(sigma))
ml maximize
/* end do file */
Here are the estimation results:
Number of obs = 74
Wald chi2(0) = .
Log likelihood = -683.89903 Prob > chi2 = .
------------------------------------------------------------------------------
price | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a |
_cons | -121.1832 71.29769 -1.70 0.089 -260.9241 18.55768
-------------+----------------------------------------------------------------
xb |
displacement | 10.50885 4.491568 2.34 0.019 1.705535 19.31216
_cons | 6672.765 2252.621 2.96 0.003 2257.709 11087.82
-------------+----------------------------------------------------------------
lnsigma |
_cons | 7.82294 .0821995 95.17 0.000 7.661832 7.984048
-------------+----------------------------------------------------------------
sigma | 2497.237 205.2716 2125.648 2933.783
------------------------------------------------------------------------------
|
Technical note: When we use mynormal_b,
missing values are reported for the Wald chi2 test. ml,
by default, reports this statistic for all the coefficients except the
intercept in the first equation; the number of degrees of freedom is
reported as zero because the first equation has only the intercept.
To perform a Wald test on the coefficients, we can use the command
test
after the estimation:
. matrix list e(b)
e(b)[1,4]
a: xb: xb: lnsigma:
_cons displacement _cons _cons
y1 -121.18323 10.508846 6672.7651 7.8229401
. test ([a]_cons) ([xb]displacement)
( 1) [a]_cons = 0
( 2) [xb]displacement = 0
chi2( 2) = 27.84
Prob > chi2 = 0.0000
These are the values reported by ml when running
mynormal_d0.
|
3.3. Including the restrictions in the model
Now we only have to change one line in the program: we substitute
`a' by -exp(`a') in the evaluation and display the
transformed parameter after the estimation:
/* begin do file */
cscript
program mynormal_c
version 12.0
args todo b lnf
tempname a lnsigma sigma
tempvar xb mu
mleval `a' = `b', eq(1) scalar
mleval `xb' = `b', eq(2)
mleval `lnsigma' = `b', eq(3)
quietly {
generate double `mu' = `xb' -exp(`a')*$x2
scalar `sigma' = exp(`lnsigma')
mlsum `lnf' = ln( normalden($ML_y1,`mu',`sigma') )
}
end
sysuse auto
global x2 mpg
ml model d0 mynormal_c (a: ) (xb: price = displacement ) (lnsigma:), ///
diparm(lnsigma, exp label(sigma))
ml init lnsigma:_cons=7.8
ml maximize
/* end do file */
Here we used the global macro x2 to incorporate the variable
mpg; when including a variable manually (i.e., not through the
standard syntax of ml), you must check for missing values.
Here are the results:
Number of obs = 74
Wald chi2(0) = .
Log likelihood = -683.89903 Prob > chi2 = .
------------------------------------------------------------------------------
price | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a |
_cons | 4.797301 .5883437 8.15 0.000 3.644169 5.950434
-------------+----------------------------------------------------------------
xb |
displacement | 10.50886 4.491554 2.34 0.019 1.705579 19.31215
_cons | 6672.754 2252.607 2.96 0.003 2257.725 11087.78
-------------+----------------------------------------------------------------
lnsigma |
_cons | 7.82294 .0821995 95.17 0.000 7.661832 7.984048
-------------+----------------------------------------------------------------
sigma | 2497.237 205.2716 2125.648 2933.784
------------------------------------------------------------------------------
Now we can use the command
nlcom to display
the estimate for the coefficient:
. nlcom -exp([a]_cons)
_nl_1: -exp([a]_cons)
------------------------------------------------------------------------------
price | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_nl_1 | -121.1829 71.29721 -1.70 0.089 -260.9229 18.55704
------------------------------------------------------------------------------
We may want to help ml a little by providing initial values. For
example, if we believe that the variance reported by regress is
reasonable, we can include it as an initial value with ml init. Here
we used the residual MS reported by regress to set the logarithm of
the variance, that is
. display log(sqrt(6499694.33))
7.8436329
4. Application: Setting a model where parameters are
proportions
Now we will apply the previous ideas to a particular example. Let’s
assume that we want to fit the model
y = a1*x1 + a2*x2 + a3*x3 + a4
where a1, a2, and a3 are
proportions, i.e., values between zero and one that add up to one. Given
this restriction, we need to estimate only two parameters (because a1 = 1
− (a2 + a3)). We need to find a proper parameterization for our
restriction. A nice choice is the parameterization used by the command
mlogit that
fits multinomial logit models.
In this approach, instead of estimating a2 and a3, we estimate
t2 and t3, where:
a2 = exp(t2)/(1+exp(t2)+exp(t3))
a3 = exp(t3)/(1+exp(t2)+exp(t3))
a1 = 1/(1+exp(t2)+exp(t3))
Verifying that this parameterization guarantees our
restriction is straightforward.
In the following code, the function mynormal_c can
be used with the command ml to perform this
estimation; the wrapper regprop runs this program
and displays the values of interest.
/* begin do file */
cscript
set seed 12345
set obs 1000
generate x1 = invnormal(uniform())
generate x2 = invnormal(uniform())
generate x3 = invnormal(uniform())
generate ep = invnormal(uniform())
generate y = .2*x1 + .5*x2 + .3*x3 + 1 + ep
regress y x1 x2 x3
program mynormal_c
version 12.0
args todo b lnf
tempname t2 t3 a4 lnsigma sigma den a1 a2 a3
tempvar mu
mleval `t2' = `b', eq(1) scalar
mleval `t3' = `b', eq(2) scalar
mleval `a4' = `b', eq(3) scalar
mleval `lnsigma' = `b', eq(4)
scalar `den' = 1+ exp(`t2') + exp(`t3')
scalar `a1' = 1/`den'
scalar `a2' = exp(`t2')/`den'
scalar `a3' = exp(`t3')/`den'
generate double `mu' = `a1'*$x_001 + `a2'*$x_002 + `a3'*$x_003 + `a4'
scalar `sigma' = exp(`lnsigma')
mlsum `lnf' = ln( normalden($ML_y1,`mu',`sigma') )
end
program regprop
version 12.0
syntax varlist
gettoken lhs rhs: varlist
global x_001 : word 1 of `rhs'
global x_002 : word 2 of `rhs'
global x_003 : word 3 of `rhs'
local f1 func(1/(1+exp(@1)+exp(@2)))
local der1 der( -exp(@1)/(1+exp(@1)+exp(@2))^2 ///
-exp(@2)/(1+exp(@1)+exp(@2))^2 )
local f2 func(exp(@1)/(1+exp(@1)+exp(@2)))
local der2 der( exp(@1)*(1+exp(@2))/(1+exp(@1)+exp(@2))^2 ///
-exp(@1)*exp(@2)/(1+exp(@1)+exp(@2))^2 )
local f3 func(exp(@2)/(1+exp(@1)+exp(@2)))
local der3 der( -exp(@1)*exp(@2)/(1+exp(@1)+exp(@2))^2 ///
exp(@2)*(1+exp(@1))/(1+exp(@1)+exp(@2))^2 )
ml model d0 mynormal_c (t2:) (t3:) (c: `lhs' = ) (lnsigma:), ///
diparm( t2 t3, `f1' `der1' label ($x_001) ci(logit) ) ///
diparm( t2 t3, `f2' `der2' label ($x_002) ci(logit) ) ///
diparm( t2 t3, `f3' `der3' label ($x_003) ci(logit) )
ml maximize
end
regprop y x1 x2 x3
/* end do file */
Here are the results:
Number of obs = 1000
Wald chi2(0) = .
Log likelihood = -1439.6632 Prob > chi2 = .
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
t2 |
_cons | .9853398 .1686424 5.84 0.000 .6548068 1.315873
-------------+----------------------------------------------------------------
t3 |
_cons | .4539756 .1953043 2.32 0.020 .0711861 .8367651
-------------+----------------------------------------------------------------
c |
_cons | 1.03804 .0323145 32.12 0.000 .9747049 1.101376
-------------+----------------------------------------------------------------
lnsigma |
_cons | .0207247 .0223607 0.93 0.354 -.0231014 .0645508
-------------+----------------------------------------------------------------
x1 | .1903572 .0260333 .1444568 .2466378
x2 | .509914 .0264092 .4582315 .5613855
x3 | .2997288 .0262758 .2508747 .3536058
------------------------------------------------------------------------------
Here the option diparm was used to display
the actual estimates of the coefficients; besides aesthetic issues, in this
case, this option is preferred to the command nlcom
because of its flexibility. For example, the option
ci(logit) allows us to make sure that the limits
of the confidence intervals will also be between zero and one. The syntax
for the option
diparm is similar to the syntax for the
command
_diparm; you
may want to have a look at this command’s
help file.
However, the confidence intervals reported by the option
diparm are asymptotically equivalent to
confidence intervals reported by nlcom; therefore,
you can also use nlcom as in the example in section
1.3.
5. Final remarks
As you probably noticed, the tricky part of this method is finding
an appropriate parameterization for the coefficients in your restriction.
You may always want to take into account the exponential function to set
inequality restrictions, the inverse logit transformation and the hyperbolic
tangent transformation to set interval restrictions, and the multinomial
logit transformation to set simultaneous interval restrictions, as in the
previous section. For example, if you need to set the restrictions:
a>3
3<b<10
you can use the transformations:
a = exp(a1) + 3
b = 3 + 7*invlogit(b1)
Also, for simplicity, I wrote all the examples by using method
d0; however, to achieve convergence, you
may need to use methods d1 or d2, in
addition to setting good initial values.
|