FAQ: Fitting a regression with interval constraints

Home / Resources & support / FAQs / Fitting a regression with interval constraints

How do I fit a regression with interval constraints in Stata?

Title		Fitting a regression with interval constraints
Author		Isabel Canette, StataCorp

Note: The examples in this tutorial are for illustration purposes only. If you need to fit a linear regression (or a nonlinear regression with Gaussian errors) with interval constraints, use the Stata command nl as explained in How do I fit a linear regression with interval (inequality) constraints in Stata?

If you are interested in fitting a linear regression with a linear constraint, you can use the cnsreg command. However, sometimes you may need to fit a model for which one of the coefficients should be positive or between zero and one. You can fit such a model by reparameterizing the coefficients. This tutorial explains the steps to fit this model by using Stata's ml command. For simplicity, I explain how to set interval constraints on a linear regression. However, this method could be applied to other kinds of regression (for example, probit, logit, Poisson, tobit), provided that they can be fitted by maximum likelihood. As stated above, if you want to set nonlinear constraints on a linear regression, you can use nl, which would not require any programming. Starting with Stata 13, the mlexp command allows you to fit maximum likelihood models by evaluating an expression. At the end of each section, I include the syntax to fit the models using mlexp.

This FAQ is organized as follows:

Introduction
Prerequisites
Our introductory example
Application: Setting a model where parameters are proportions
Final remarks

1. Introduction

I will start by presenting an example on how to use ml to fit a linear regression. The code in this example can be modified to set interval constraints. I will explain how to perform this modification by showing the steps to restrict one coefficient to be positive.

Finally, I will present an example where the coefficients are restricted to be proportions; that is, they are constrained to be positive numbers that add up to one.

If you know (from the nature of the problem) that the constraints you want to set are true, the following method will allow you to set these restrictions in the model. However, if the data do not support the validity of these constraints, the algorithm for the restricted model probably won’t achieve convergence.

There are essentially two situations where we may want to set interval constraints. One case is when the numerical global maximum of the likelihood actually is where we want it to be, but we want to help the algorithm avoid difficult spots somewhere else. This case is similar to the situation of estimating the standard deviation in a linear regression, where the maximum is positive, but we set the restriction to help the algorithm achieve convergence. The other situation is when we know that the true parameter is in a certain region, but for numerical reasons the global maximum of the likelihood may be somewhere else. We are interested not in finding a global maximum but in finding a maximum in a certain subset. This is the case of the example of proportions, considered in section 4.

If you also need to set linear constraints, you can use the constraint option of the ml command.

2. Prerequisites

This tutorial assumes that you are familiar with the ml command. I will review some characteristics of the estimation by using ml here. However, if you have never used Stata's ml command, see [R] ml.

Technical notes explain some details on the output; you can skip them in a first reading.

3. Our introductory example

Let’s assume that we want to reproduce the following estimation,

. sysuse auto, clear
(1978 automobile data)

. regress price mpg displacement

but we want to make sure that the coefficient of variable mpg will be negative.

Because we are going to use the d0 method, we will borrow the mynormal_d0.ado program from the book Maximum Likelihood Estimation with Stata, Fourth Edition, by Gould, Pitblado, and Poi (you can download this file from the book’s website). Although the syntax for the lf method is simpler, if we adapt an lf program to fit a model with interval constraints, the good properties of lf will be lost (see technical note below); some preliminary simulations showed that, when adapted to our problem, the d0 method is better for convergence.

Technical note: Whenever the likelihood meets the linear form, the lf method is efficient. With lf, the derivatives are computed at the equation level, and the chain rule is applied to obtain the derivatives with respect to the parameters. This method produces high-quality derivatives and therefore accurate estimators. When we modify a maximum likelihood program to meet interval restrictions according to the procedure explained below, the equations are disassembled; then, even when using an lf program, we will force ml to compute numerical derivatives directly with respect to the coefficients, losing the numerical benefits of lf.

3.1. Using method d0 to fit a linear regression

Let's start by fitting the model without restrictions, using regress.

. sysuse auto, clear
(1978 automobile data)

. regress price mpg displacement


      Source         SS           df       MS     Number of obs   =        74
      F(2, 71)        =     13.35
       Model     173587098         2  86793549.2    Prob > F        =    0.0000
    Residual     461478298        71  6499694.33    R-squared       =    0.2733
      Adj R-squared   =    0.2529
       Total     635065396        73  8699525.97    Root MSE        =    2549.4




       price   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
         mpg    -121.1833   72.78844    -1.66   0.100    -266.3193    23.95276
displacement     10.50885    4.58548     2.29   0.025     1.365658    19.65203
       _cons     6672.766    2299.72     2.90   0.005     2087.254    11258.28

Now let’s fit the same model by using ml:

/* begin do file */
cscript
program mynormal_d0
        version 17  
        args todo b lnf
        tempname lnsigma sigma
        tempvar mu 
        mleval `mu' = `b', eq(1)
        mleval `lnsigma' = `b', eq(2) scalar
        quietly {
                scalar `sigma' = exp(`lnsigma')
                mlsum `lnf' = ln( normalden($ML_y1,`mu',`sigma') )
        }
end

sysuse auto 
ml model d0 mynormal_d0 (xb: price = mpg displacement) (lnsigma:), ///
diparm(lnsigma, exp label(sigma))  
ml maximize
/* end do file */

Here are the results:

                                                        Number of obs =     74
                                                        Wald chi2(2)  =  27.84
Log likelihood = -683.89903                             Prob > chi2   = 0.0000



       price   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

xb            
         mpg    -121.1831   71.29769    -1.70   0.089     -260.924    18.55784
displacement     10.50884   4.491567     2.34   0.019     1.705527    19.31215
       _cons     6672.764   2252.621     2.96   0.003     2257.709    11087.82

lnsigma       
       _cons      7.82294   .0821995    95.17   0.000     7.661832    7.984048

       sigma     2497.236   205.2714                      2125.648    2933.782

Technical note: You will see some differences between the output of ml and the output of regress. First, the standard errors are different because a different number of degrees of freedom was used to compute the variance of the errors. The variance for the errors in regress and ml is estimated, respectively, as v1 and v2, where

v1 = e'e/(N-k)
v2 = e'e/(N)

Here e is the vector of residuals, N is the number of observations, and k is the number of regressors, including the constant.

Second, the confidence intervals reported by regress are based on the t distribution, and ml uses the Gaussian distribution to compute them.

Third, regress performs an F test, and ml performs a Wald test. See the following FAQ to learn how these tests relate to each other: Why does test sometimes produce chi-squared and other times F statistics?

Let’s have a closer look at the code for ml; the line

ml model d0 mynormal_d0 (xb: price = mpg displacement) (lnsigma:)

introduces two equations. The first equation,

(xb: price = mpg displacement)

indicates the following:

The name of the equation is xb; this name is displayed in the output.
The dependent variable is price; this variable is referred to as $ML_y1 in the mynormal_d0 program.
This equation has two independent variables: mpg and displacement.
Therefore, when the line
```
mleval `mu' = `b', eq(1)
```
is run in the program, the local variable `mu' is evaluated as a linear combination of these two variables. Because of the second line in mynormal_d0,
```
args todo b lnf
```
the local macro `b' will contain the matrix of parameters. Therefore, what the evaluation
```
mleval `mu' = `b', eq(1)
```
does, essentially, is
```
replace `mu' = `b'[1,1]*mpg + `b'[1,2]*displacement + `b'[1,3]
```

The second equation,

(lnsigma:)

indicates that

the name of this equation is lnsigma, and
there are not independent variables in this equation.

Because this is a constant-only equation, there is no need to generate a whole variable for it in the evaluation. Evaluating it as scalar is more efficient, as in

mleval `lnsigma' = `b', eq(2) scalar

This is essentially the same as

scalar `lnsigma' = `b'[1,4]

This parameter is used in the line

scalar double `sigma' = exp(`lnsigma')

We are not estimating the standard deviation (sigma), but its logarithm (lnsigma). This trick allows us to express the standard deviation as an exponent (exp(`lnsigma')) and guarantees that sigma will be positive. We make sure that the algorithm will not search for sigma in the negative subspace. We want to use this trick to apply an inequality constraint to a coefficient. The model above could also be fit by using the mlexp syntax:

mlexp (ln(normalden(price, {xb:mpg displacement}+{b0}, exp({lnsigma}))))

3.2. Setting separate equations for the coefficients

In this step, we set up a separate equation for the coefficient that we will restrict.

In the mynormal_d0 program, the coefficient of mpg is included in equation 1. To set a restriction on it, we need to extract this coefficient from the equation. More precisely, we need to include this coefficient as a scalar parameter in the model, that is

ml model d0 mynormal_b (a: ) (xb: price = displacement ) (lnsigma:)

where the constant-only equation (a: ) corresponds to the coefficient of mpg. The question of how to include the variable itself remains; as we cannot introduce this variable in any of the equations, we will need to use a global macro. Look at the following code:

/* begin do file */
cscript
program mynormal_b
        version 17  
        args todo b lnf
        tempname a lnsigma sigma
        tempvar xb mu 
        mleval `a' = `b', eq(1) scalar
        mleval `xb' = `b', eq(2)
        mleval `lnsigma' = `b', eq(3) scalar
        quietly {
                generate double `mu' = `xb' +`a'*$x2
                scalar `sigma' = exp(`lnsigma')
                mlsum `lnf' = ln( normalden($ML_y1,`mu',`sigma') )
        }
end

sysuse auto
global x2  mpg

ml model d0 mynormal_b (a: ) (xb: price = displacement ) (lnsigma:), ///
diparm(lnsigma, exp label(sigma))
ml maximize
/* end do file */

Here are the estimation results:

                                                            Number of obs = 74
                                                            Wald chi2(0)  =  .
Log likelihood = -683.89903                                 Prob > chi2   =  .



       price   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

a             
       _cons    -121.1833   71.29771    -1.70   0.089    -260.9242    18.55769

xb            
displacement     10.50885   4.491569     2.34   0.019     1.705532    19.31216
       _cons     6672.766   2252.621     2.96   0.003     2257.709    11087.82

lnsigma       
       _cons      7.82294   .0821995    95.17   0.000     7.661832    7.984048

       sigma     2497.237   205.2716                      2125.649    2933.784

Technical note: When we use mynormal_b, missing values are reported for the Wald chi2 test. ml, by default, reports this statistic for all the coefficients except the intercept in the first equation; the number of degrees of freedom is reported as zero because the first equation has only the intercept.

To perform a Wald test on the coefficients, we can use the command test after the estimation:

. matrix list e(b) 

e(b)[1,4]
               a:           xb:           xb:      lnsigma:
           _cons  displacement         _cons         _cons
y1    -121.18326     10.508845     6672.7659     7.8229403

. test ([a]_cons) ([xb]displacement) 

 ( 1)  [a]_cons = 0
 ( 2)  [xb]displacement = 0

           chi2(  2) =   27.84
         Prob > chi2 =    0.0000

These are the values reported by ml when running mynormal_d0.

The model above could also be fit by using the following mlexp syntax:

mlexp (ln(normalden(price, {a}*mpg+{b1}*displacement+{b0}, exp({lnsigma}))))

3.3. Including the restrictions in the model

Now we only have to change one line in the program: we substitute `a' by -exp(`a') in the evaluation and display the transformed parameter after the estimation:

/* begin do file */ 
cscript
program mynormal_c
        version 17  
        args todo b lnf
        tempname a  lnsigma sigma
        tempvar xb mu
        mleval `a' = `b', eq(1) scalar
        mleval `xb' = `b', eq(2)
        mleval `lnsigma' = `b', eq(3)
          quietly {
                generate double `mu' = `xb' -exp(`a')*$x2
                scalar `sigma' = exp(`lnsigma')
                mlsum `lnf' = ln( normalden($ML_y1,`mu',`sigma') )
        }
end


sysuse auto
global x2  mpg
ml model d0 mynormal_c (a: ) (xb: price = displacement ) (lnsigma:), ///
diparm(lnsigma, exp label(sigma))
ml init lnsigma:_cons=7.8
ml maximize
/* end do file */

Here we used the global macro x2 to incorporate the variable mpg; when including a variable manually (for example, not through the standard syntax of ml), you must check for missing values.

Here are the results:

                                                             Number of obs = 74
                                                             Wald chi2(0)  =  .
Log likelihood = -683.89903                                  Prob > chi2   =  .



       price   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

a            
       _cons    4.797299     .58835     8.15   0.000     3.644154    5.950444

xb           
displacement    10.50887   4.491572     2.34   0.019     1.705552    19.31219
       _cons    6672.748   2252.625     2.96   0.003     2257.685    11087.81

lnsigma      
       _cons     7.82294   .0821995    95.17   0.000     7.661832    7.984048

       sigma    2497.237   205.2717                      2125.649    2933.784

Now we can use the nlcom command to display the estimate for the coefficient:

. nlcom -exp([a]_cons)

       _nl_1:  -exp([a]_cons)



       price   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
   
       _nl_1   -121.1827   71.29782    -1.70   0.089    -260.9238     18.55855

We may want to help ml a little by providing initial values. For example, if we believe that the variance reported by regress is reasonable, we can include it as an initial value with ml init. Here we used the residual MS reported by regress to set the logarithm of the variance, that is

. display log(sqrt(6499694.33))    
7.8436329

The model above could also be fit by using the following mlexp syntax:

mlexp (ln(normalden(price, -exp({a})*mpg + {b1}*displacement +{b0}, exp({lnsigma}))))

4. Application: Setting a model where parameters are proportions

Now we will apply the previous ideas to a particular example. Let’s assume that we want to fit the model

       y = a1*x1 + a2*x2 + a3*x3 + a4

where a1, a2, and a3 are proportions, for example, values between zero and one that add up to one. Given this restriction, we need to estimate only two parameters (because a1 = 1 − (a2 + a3)). We need to find a proper parameterization for our restriction. A nice choice is the parameterization used by the mlogit command that fits multinomial logit models.

In this approach, instead of estimating a2 and a3, we estimate t2 and t3, where:

       a2 = exp(t2)/(1+exp(t2)+exp(t3))
       a3 = exp(t3)/(1+exp(t2)+exp(t3))
       a1 = 1/(1+exp(t2)+exp(t3))

Verifying that this parameterization guarantees our restriction is straightforward.

In the following code, the mynormal_c function can be used with the ml command to perform this estimation; the regprop wrapper runs this program and displays the values of interest.

/* begin do file */
cscript
set seed 12345
set obs 1000
generate x1 = invnormal(uniform())
generate x2 = invnormal(uniform())
generate x3 = invnormal(uniform())
generate ep = invnormal(uniform())

generate y = .2*x1 + .5*x2 + .3*x3 + 1 + ep
regress y x1 x2 x3

program mynormal_c
	version 17  
	args todo b lnf
	tempname t2 t3 a4 lnsigma sigma den a1 a2 a3 
	tempvar mu
	mleval `t2' = `b', eq(1) scalar
	mleval `t3' = `b', eq(2) scalar
	mleval `a4' = `b', eq(3) scalar
	mleval `lnsigma' = `b', eq(4)
	scalar `den' = 1+ exp(`t2') + exp(`t3')
	scalar `a1' =  1/`den'
	scalar `a2' = exp(`t2')/`den'
	scalar `a3'  = exp(`t3')/`den'
	generate double `mu' = `a1'*$x_001 + `a2'*$x_002 + `a3'*$x_003 + `a4'
	scalar `sigma' = exp(`lnsigma')
	mlsum `lnf' = ln( normalden($ML_y1,`mu',`sigma') )
	
end


program regprop
	version 17  
	syntax varlist
	gettoken lhs rhs: varlist
	global x_001 : word 1 of `rhs'
	global x_002 : word 2 of `rhs'
	global x_003 : word 3 of `rhs'
  

        local f1 func(1/(1+exp(@1)+exp(@2)))
        local der1 der( -exp(@1)/(1+exp(@1)+exp(@2))^2 ///
              -exp(@2)/(1+exp(@1)+exp(@2))^2 ) 
        local f2  func(exp(@1)/(1+exp(@1)+exp(@2))) 
        local der2 der( exp(@1)*(1+exp(@2))/(1+exp(@1)+exp(@2))^2 ///
              -exp(@1)*exp(@2)/(1+exp(@1)+exp(@2))^2 ) 
        local f3 func(exp(@2)/(1+exp(@1)+exp(@2))) 
        local der3 der( -exp(@1)*exp(@2)/(1+exp(@1)+exp(@2))^2 /// 
              exp(@2)*(1+exp(@1))/(1+exp(@1)+exp(@2))^2 ) 
  
        ml model d0 mynormal_c (t2:) (t3:) (c: `lhs' = ) (lnsigma:), ///
        diparm( t2 t3, `f1' `der1' label ($x_001) ci(logit) ) ///
        diparm( t2 t3, `f2' `der2' label ($x_002) ci(logit) ) ///
        diparm( t2 t3, `f3' `der3' label ($x_003) ci(logit) )
        ml maximize 
end
  
regprop y x1 x2 x3
/* end do file */

Here are the results:

                                                         Number of obs = 1,000
                                                         Wald chi2(0)  =     .
Log likelihood = -1414.503                               Prob > chi2   =     .



           y   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

t2            
       _cons     .7524279   .1504271     5.00   0.000     .4575961     1.04726

t3            
       _cons     .2617819   .1745923     1.50   0.134    -.0804128    .6039765

c             
       _cons     .9913625   .0314883    31.48   0.000     .9296466    1.053078

lnsigma       
       _cons    -.0044355   .0223607    -0.20   0.843    -.0482616    .0393906

          x1     .2261733   .0259554                      .1793569    .2810251
          x2     .4799727   .0262131                      .4289861    .5313799
          x3     .2938541   .0256475                      .2461986    .3464938

Here the diparm option was used to display the actual estimates of the coefficients; besides aesthetic issues, in this case, this option is preferred to the nlcom command because of its flexibility. For example, the ci(logit) option allows us to make sure that the limits of the confidence intervals will also be between zero and one. The syntax for the diparm option is similar to the syntax for the command _diparm.

However, the confidence intervals reported by the diparm option are asymptotically equivalent to confidence intervals reported by nlcom; therefore, you can also use nlcom as in the example in section 1.3.

The model above could also be fit by using the following syntax:

local den  (1+ exp({t2}) + exp({t3}))
local a1 1/`den'
local a2  exp({t2})/`den'
local a3  exp({t3})/`den'
mlexp (ln(normalden(y, `a1'*x1 + `a2'*x2 + `a3'*x3 + {c}, exp({lnsigma}) )))

5. Final remarks

As you probably noticed, the tricky part of this method is finding an appropriate parameterization for the coefficients in your restriction. You may always want to take into account the exponential function to set inequality restrictions, the inverse logit transformation and the hyperbolic tangent transformation to set interval restrictions, and the multinomial logit transformation to set simultaneous interval restrictions, as in the previous section. For example, if you need to set the restrictions:

       a>3
       3<b<10

you can use the transformations:

       a = exp(a1) + 3
       b = 3 + 7*invlogit(b1)

Also, for simplicity, I wrote all the examples by using the d0 method; however, to achieve convergence, you may need to use the d1 or d2 methods, in addition to setting good initial values.

How do I fit a regression with interval constraints in Stata?

1. Introduction

2. Prerequisites

3. Our introductory example

3.1. Using method d0 to fit a linear regression

3.2. Setting separate equations for the coefficients

3.3. Including the restrictions in the model

4. Application: Setting a model where parameters are proportions

5. Final remarks

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Source	SS df MS	Number of obs = 74
		F(2, 71) = 13.35
Model	173587098 2 86793549.2	Prob > F = 0.0000
Residual	461478298 71 6499694.33	R-squared = 0.2733
		Adj R-squared = 0.2529
Total	635065396 73 8699525.97	Root MSE = 2549.4


price		Coefficient Std. err. t P>\|t\| [95% conf. interval]

mpg		-121.1833 72.78844 -1.66 0.100 -266.3193 23.95276
displacement		10.50885 4.58548 2.29 0.025 1.365658 19.65203
_cons		6672.766 2299.72 2.90 0.005 2087.254 11258.28


price		Coefficient Std. err. z P>\|z\| [95% conf. interval]

xb
mpg		-121.1831 71.29769 -1.70 0.089 -260.924 18.55784
displacement		10.50884 4.491567 2.34 0.019 1.705527 19.31215
_cons		6672.764 2252.621 2.96 0.003 2257.709 11087.82

lnsigma
_cons		7.82294 .0821995 95.17 0.000 7.661832 7.984048

sigma		2497.236 205.2714 2125.648 2933.782


price		Coefficient Std. err. z P>\|z\| [95% conf. interval]

_nl_1		-121.1827 71.29782 -1.70 0.089 -260.9238 18.55855


y		Coefficient Std. err. z P>\|z\| [95% conf. interval]

t2
_cons		.7524279 .1504271 5.00 0.000 .4575961 1.04726

t3
_cons		.2617819 .1745923 1.50 0.134 -.0804128 .6039765

c
_cons		.9913625 .0314883 31.48 0.000 .9296466 1.053078

lnsigma
_cons		-.0044355 .0223607 -0.20 0.843 -.0482616 .0393906

x1		.2261733 .0259554 .1793569 .2810251
x2		.4799727 .0262131 .4289861 .5313799
x3		.2938541 .0256475 .2461986 .3464938

Stata/MP4 Annual License (download)

How do I fit a regression with interval constraints in Stata?

1. Introduction

2. Prerequisites

3. Our introductory example

3.1. Using method d0 to fit a linear regression

3.2. Setting separate equations for the coefficients

3.3. Including the restrictions in the model

4. Application: Setting a model where parameters are proportions

5. Final remarks

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies