**[R] binreg** -- Generalized linear models: Extensions to the binomial family

__Syntax__

**binreg** *depvar* [*indepvars*] [*if*] [*in*] [*weight*] [**,** *options*]

*options* Description
-------------------------------------------------------------------------
Model
__nocons__**tant** suppress constant term
**or** use logit link and report odds ratios
**rr** use log link and report risk ratios
**hr** use log-complement link and report health
ratios
**rd** use identity link and report risk differences
**n(***#*|*varname***)** use *#* or *varname* for number of trials
__exp__**osure(***varname***)** include ln(*varname*) in model with coefficient
constrained to 1
__off__**set(***varname***)** include *varname* in model with coefficient
constrained to 1
__const__**raints(***constraints***)** apply specified linear constraints
__col__**linear** keep collinear variables
**mu(***varname***)** use *varname* as the initial estimate for the
mean of *depvar*
__ini__**t(***varname***)** synonym for **mu(***varname***)**

SE/Robust
**vce(***vcetype***)** *vcetype* may be **eim**, __r__**obust**, __cl__**uster** *clustvar*,
**oim**, **opg**, __boot__**strap**, __jack__**knife**, **hac** *kernel*,
**jackknife1**, or __unb__**iased**
**t(***varname***)** variable name corresponding to time
__vf__**actor(***#***)** multiply variance matrix by scalar *#*
**disp(***#***)** quasilikelihood multiplier
__sca__**le(x2**|**dev**|*#***)** set the scale parameter; default is **scale(1)**

Reporting
__l__**evel(***#***)** set confidence level; default is **level(95)**
__coef__**ficients** report nonexponentiated coefficients
__nocnsr__**eport** do not display constraints
*display_options* control columns and column formats, row
spacing, line width, display of omitted
variables and base and empty cells, and
factor-variable labeling

Maximization
**irls** use iterated, reweighted least-squares
optimization; the default
**ml** use maximum likelihood optimization
*maximize_options* control the maximization process; seldom used
**fisher(***#***)** Fisher scoring steps
**search** search for good starting values

__coefl__**egend** display legend instead of statistics
-------------------------------------------------------------------------
*indepvars* may contain factor variables; see fvvarlist.
*depvar* and *indepvars* may contain time-series operators; see tsvarlist.
**bayes**, **bootstrap**, **by**, **fp**, **jackknife**, **mi estimate**, **rolling**, and **statsby**
are allowed; see prefix. For more details, see **[BAYES] bayes: binreg**.
**vce(bootstrap)**, **vce(jackknife)**, and **vce(jackknife1)** are not allowed with
the **mi estimate** prefix.
Weights are not allowed with the **bootstrap** prefix.
**aweight**s are not allowed with the **jackknife** prefix.
**fweight**s, **aweight**s, **iweight**s, and **pweight**s are allowed; see weight.
**coeflegend** does not appear in the dialog box.
See **[R] binreg postestimation** for features available after estimation.

__Menu__

**Statistics > Generalized linear models > GLM for the binomial family**

__Description__

**binreg** fits generalized linear models for the binomial family. It
estimates odds ratios, risk ratios, health ratios, and risk differences.
The available links are

Option Implied link Parameter
-----------------------------------------------
**or** logit odds ratios = exp(b)
**rr** log risk ratios = exp(b)
**hr** log complement health ratios = exp(b)
**rd** identity risk differences = b

Estimates of odds, risk, and health ratios are obtained by exponentiating
the appropriate coefficients. The **or** option produces the same results as
Stata's **logistic** command, and **or coefficients** yields the same results as
the **logit** command. When no link is specified, **or** is assumed.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**noconstant**; see **[R] estimation options**.

**or** requests the logit link and results in odds ratios if **coefficients** is
not specified.

**rr** requests the log link and results in risk ratios if **coefficients** is
not specified.

**hr** requests the log-complement link and results in health ratios if
**coefficients** is not specified.

**rd** requests the identity link and results in risk differences.

**n(***#*|*varname***)** specifies either a constant integer to use as the
denominator for the binomial family or a variable that holds the
denominator for each observation.

**exposure(***varname***)**, **offset(***varname***)**, **constraints(***constraints***)**, **collinear**;
see **[R] estimation options**. **constraints(***constraints***)** and **collinear**
are not allowed with **irls**.

**mu(***varname***)** specifies *varname* containing an initial estimate for the mean
of *depvar*. This option can be useful if you encounter convergence
difficulties. **init(***varname***)** is a synonym.

+-----------+
----+ SE/Robust +--------------------------------------------------------

**vce(***vcetype***)** specifies the type of standard error reported, which
includes types that are robust to some kinds of misspecification
(**robust)**, that allow for intragroup correlation (**cluster** *clustvar*),
that are derived from asymptotic theory (**oim**, **opg**), and that use
bootstrap or jackknife methods (**bootstrap**, **jackknife**); see **[R]**
*vce_option*.

**vce(eim)**, the default, uses the expected information matrix for the
variance estimator.

**binreg** also allows the following:

**vce(hac** *kernel* [*#*]**)** specifies that a heteroskedasticity- and
autocorrelation-consistent (HAC) variance estimate be used. HAC
refers to the general form for combining weighted matrices to
form the variance estimate. There are three kernels built into
**binreg**. *kernel* is a user-written program or one of

__nw__**est** | __ga__**llant** | __an__**derson**

If *#* not specified, N - 2 is assumed.

**vce(jackknife1)** specifies that the one-step jackknife estimate of
variance be used.

**vce(unbiased)** specifies that the unbiased sandwich estimate of
variance be used.

**t(***varname***)** specifies the variable name corresponding to time; see **[TS]**
**tsset**. **binreg** does not always need to know **t()**, though it does if
**vce(hac** ... **)** is specified. Then you can either specify the time
variable with **t()**, or you can **tsset** your data before calling **binreg**.
When the time variable is required, **binreg** assumes that the
observations are spaced equally over time.

**vfactor(***#***)** specifies a scalar by which to multiply the resulting variance
matrix. This option allows users to match output with other
packages, which may apply degrees of freedom or other small-sample
corrections to estimates of variance.

**disp(***#***)** multiplies the variance of *depvar* by *#* and divides the deviance
by *#*. The resulting distributions are members of the quasilikelihood
family.

**scale(x2**|**dev**|*#***)** overrides the default scale parameter. This option is
allowed only with Hessian (information matrix) variance estimates.

By default, **scale(1)** is assumed for discrete distributions (binomial,
Poisson, and negative binomial), and **scale(x2)** is assumed for
continuous distributions (Gaussian, gamma, and inverse Gaussian).

**scale(x2)** specifies that the scale parameter be set to the Pearson
chi-squared (or generalized chi-squared) statistic divided by the
residual degrees of freedom, which was recommended by McCullagh and
Nelder (1989) as a good general choice for continuous distributions.

**scale(dev)** sets the scale parameter to the deviance divided by the
residual degrees of freedom. This option provides an alternative to
**scale(x2)** for continuous distributions and overdispersed or
underdispersed discrete distributions.

**scale(***#***)** sets the scale parameter to *#*.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**, **noconstant**; see **[R] estimation options**.

**coefficients** displays the nonexponentiated coefficients and corresponding
standard errors and confidence intervals. This option has no effect
when the **rd** option is specified, because it always presents the
nonexponentiated coefficients.

**nocnsreport**; see **[R] estimation options**.

*display_options*: **noci**, __nopv__**alues**, __noomit__**ted**, **vsquish**, __noempty__**cells**,
__base__**levels**, __allbase__**levels**, __nofvlab__**el**, **fvwrap(***#***)**, **fvwrapon(***style***)**,
**cformat(***%fmt***)**, **pformat(%***fmt***)**, **sformat(%***fmt***)**, and **nolstretch**; see **[R]**
**estimation options**.

+--------------+
----+ Maximization +-----------------------------------------------------

**irls** requests iterated, reweighted least-squares (IRLS) optimization of
the deviance instead of Newton-Raphson optimization of the log
likelihood. This option is the default.

**ml** requests that optimization be carried out by using Stata's **ml** command.

*maximize_options*: __tech__**nique(***algorithm_spec***)**, [__no__]__lo__**g**, __tr__**ace**, __grad__**ient**,
**showstep**, __hess__**ian**, __showtol__**erance**, __dif__**ficult**, __iter__**ate(***#***)**,
__tol__**erance(***#***)**, __ltol__**erance(***#***)**, __nrtol__**erance(***#***)**, __nonrtol__**erance**, and
**from(***init_specs***)**; see **[R] maximize**. These options are seldom used.

Setting the optimization method to **ml**, with **technique()** set to
something other than BHHH, changes the *vcetype* to **vce(oim)**.
Specifying **technique(bhhh)** changes *vcetype* to **vce(opg)**.

**fisher(***#***)** specifies the number of Newton-Raphson steps that should use
the Fisher scoring Hessian or expected information matrix (EIM)
before switching to the observed information matrix (OIM). This
option is available only if **ml** is specified and is useful only for
Newton-Raphson optimization.

**search** specifies that the command search for good starting values. This
option is available only if **ml** is specified and is useful only for
Newton-Raphson optimization.

The following option is available with **binreg** but is not shown in the
dialog box:

**coeflegend**; see **[R] estimation options**.

__Examples__

---------------------------------------------------------------------------
Setup
**. webuse lbw**

Report odds ratios
**. binreg low age lwt i.race smoke ptl ht ui, or**

---------------------------------------------------------------------------
Setup
**. webuse binreg**

Report risk ratios
**. binreg n_lbw_babies i.soc i.alc i.smo, n(n_women) rr**

Obtain nonexponentiated coefficients
**. binreg n_lbw_babies i.soc i.alc i.smo, n(n_women) rr coeff**

Report risk differences
**. binreg n_lbw_babies i.soc i.alc i.smo, n(n_women) rd**

Report health ratios
**. binreg n_lbw_babies i.soc i.alc i.smo, n(n_women) hr**
---------------------------------------------------------------------------

__Stored results__

**binreg, irls** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(k)** number of parameters
**e(k_eq_model)** number of equations in overall model test
**e(df_m)** model degrees of freedom
**e(df)** residual degrees of freedom
**e(phi)** model scale parameter
**e(disp)** dispersion parameter
**e(bic)** model BIC
**e(N_clust)** number of clusters
**e(deviance)** deviance
**e(deviance_s)** scaled deviance
**e(deviance_p)** Pearson deviance
**e(deviance_ps)** scaled Pearson deviance
**e(dispers)** dispersion
**e(dispers_s)** scaled dispersion
**e(dispers_p)** Pearson dispersion
**e(dispers_ps)** scaled Pearson dispersion
**e(vf)** factor set by **vfactor()**, **1** if not set
**e(rank)** rank of **e(V)**
**e(rc)** return code

Macros
**e(cmd)** **binreg**
**e(cmdline)** command as typed
**e(depvar)** name of dependent variable
**e(eform)** **eform()** option implied by **or**, **rr**, **hr**, or **rd**
**e(varfunc)** program to calculate variance function
**e(varfunct)** variance title
**e(varfuncf)** variance function
**e(link)** program to calculate link function
**e(linkt)** link title
**e(linkf)** link function
**e(m)** number of binomial trials
**e(wtype)** weight type
**e(wexp)** weight expression
**e(title)** title in estimation output
**e(title_fl)** family-link title
**e(clustvar)** name of cluster variable
**e(offset)** linear offset variable
**e(cons)** **noconstant** or not set
**e(hac_kernel)** HAC kernel
**e(hac_lag)** HAC lag
**e(vce)** *vcetype* specified in **vce()**
**e(vcetype)** title used to label Std. Err.
**e(opt)** type of optimization
**e(opt1)** optimization title, line 1
**e(opt2)** optimization title, line 2
**e(properties)** **b V**
**e(predict)** program used to implement **predict**
**e(marginsok)** predictions allowed by **margins**
**e(marginsnotok)** predictions disallowed by **margins**
**e(asbalanced)** factor variables **fvset** as **asbalanced**
**e(asobserved)** factor variables **fvset** as **asobserved**

Matrices
**e(b)** coefficient vector
**e(V)** variance-covariance matrix of the estimators
**e(V_modelbased)** model-based variance

Functions
**e(sample)** marks estimation sample

**binreg, ml** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(k)** number of parameters
**e(k_eq)** number of equations in **e(b)**
**e(k_eq_model)** number of equations in overall model test
**e(k_dv)** number of dependent variables
**e(df_m)** model degrees of freedom
**e(df)** residual degrees of freedom
**e(phi)** model scale parameter
**e(aic)** model AIC, if **ml**
**e(bic)** model BIC
**e(ll)** log likelihood, if **ml**
**e(N_clust)** number of clusters
**e(chi2)** chi-squared
**e(p)** p-value for model test
**e(deviance)** deviance
**e(deviance_s)** scaled deviance
**e(deviance_p)** Pearson deviance
**e(deviance_ps)** scaled Pearson deviance
**e(dispers)** dispersion
**e(dispers_s)** scaled dispersion
**e(dispers_p)** Pearson dispersion
**e(dispers_ps)** scaled Pearson dispersion
**e(vf)** factor set by **vfactor()**, **1** if not set
**e(rank)** rank of **e(V)**
**e(ic)** number of iterations
**e(rc)** return code
**e(converged)** **1** if converged, **0** otherwise

Macros
**e(cmd)** **binreg**
**e(cmdline)** command as typed
**e(depvar)** name of dependent variable
**e(eform)** **eform()** option implied by **or**, **rr**, **hr**, or **rd**
**e(varfunc)** program to calculate variance function
**e(varfunct)** variance title
**e(varfuncf)** variance function
**e(link)** program to calculate link function
**e(linkt)** link title
**e(linkf)** link function
**e(m)** number of binomial trials
**e(wtype)** weight type
**e(wexp)** weight expression
**e(title)** title in estimation output
**e(title_fl)** family-link title
**e(clustvar)** name of cluster variable
**e(offset)** linear offset variable
**e(cons)** **noconstant** or not set
**e(hac_kernel)** HAC kernel
**e(hac_lag)** HAC lag
**e(chi2type)** **Wald**; type of model chi-squared test
**e(vce)** *vcetype* specified in **vce()**
**e(vcetype)** title used to label Std. Err.
**e(opt)** type of optimization
**e(opt1)** optimization title, line 1
**e(which)** **max** or **min**; whether optimizer is to perform
maximization or minimization
**e(ml_method)** type of **ml** method
**e(user)** name of likelihood-evaluator program
**e(technique)** maximization technique
**e(properties)** **b V**
**e(predict)** program used to implement **predict**
**e(marginsok)** predictions allowed by **margins**
**e(marginsnotok)** predictions disallowed by **margins**
**e(asbalanced)** factor variables **fvset** as **asbalanced**
**e(asobserved)** factor variables **fvset** as **asobserved**

Matrices
**e(b)** coefficient vector
**e(Cns)** constraints matrix
**e(ilog)** iteration log (up to 20 iterations)
**e(gradient)** gradient vector
**e(V)** variance-covariance matrix of the estimators
**e(V_modelbased)** model-based variance

Functions
**e(sample)** marks estimation sample

__Reference__

McCullagh, P., and J. A. Nelder. 1989. *Generalized Linear Models. 2nd*
*ed.* London: Chapman & Hall/CRC.