## In the spotlight: Bayesian IRT–4PL model

Item response theory (IRT) is often used for modeling the relationship
between the latent abilities of a group of subjects and the
examination items used for measuring their abilities. Stata 14
introduced a suite of commands for fitting IRT models using maximum
likelihood; see, for example, "In
the spotlight: **irt**" by Rafal Raciborski in *Stata
News*, Volume 30 Number 3 and the [IRT] *Item Response Theory Reference
Manual* for more details. One-parameter logistic (1PL), 2PL,
and 3PL IRT models are commonly used to model binary responses. Models
beyond 3PL, such as 4PL and 5PL models, have not been as widely used.
One of the reasons is the difficulty in estimating the additional
parameters introduced by these models using maximum likelihood. In
recent years, these models have been reconsidered within the Bayesian
framework (Loken and Rulison 2010; Fox 2010; Kim and Bolt 2007). In
this article, we demonstrate how to fit a Bayesian 4PL model using
**bayesmh**.

### Data

We will use the abridged version of the mathematics and science data
from DeBoeck and Wilson (2004), which contains 800 student responses,
**y**, to 9 test questions (items) intended to measure mathematical
ability. To fit IRT models using **bayesmh**, the data must be in
long form with items, **item**, recorded as multiple observations
per subject, **id**.

### Model

We consider the following 4PL model,

*P*(*Y _{ij}* = 1) =

*c*+ (

*d*–

*c*)InvLogit{

*a*(

_{i}*θ*–

_{j}*b*)},

_{i}*c*<

*d*< 1

where *i* = 1, 2, ..., 9 and *j* = 1, 2, ..., 800. The 4PL
model extends the 3PL model by adding an upper asymptote parameter d
≠ 1. The *d* parameter can be viewed as an upper limit on the
probability of correct response to the *i*th item. The probability of
giving correct answers by subjects with very high ability can thus be
no greater than *d*. *a _{i}* and

*b*are item-specific discrimination and difficulties. Here we consider a common guessing parameter

_{i}*c*and a common upper asymptote parameter

*d*, but they can also be item specific.

**InvLogit()**is an inverse-logit function. The latent abilities

*θ*are assumed to be normally distributed:

_{j}
*θ _{j}* ∼

*N*(0,1)

A Bayesian formulation also requires prior specifications for all other model parameters. This is an important step in Bayesian modeling and must be considered carefully. For illustration, we consider the following priors.

Discrimination parameters *a _{i}*'s are assumed to be
positive and are often modeled in the log scale. Because we have no
prior knowledge about the discrimination and difficulty parameters, we
assume that the prior distributions of

*ln*(

*a*) and

_{i}*b*have support on the whole real line and are symmetric. A normal prior distribution is thus a natural choice. To control the impact of the prior on these parameters, we consider a hierarchical Bayesian model specification and introduce hyperparameters to model means and variances of the normal prior distribution.

_{i}
*ln*(*a _{i}*) ∼

*N*(

*μ*,

_{a}*σ*)

^{2}_{a}*b*∼

_{i}*N*(

*μ*,

_{b}*σ*)

^{2}_{b}
We use informative priors for the guessing parameter *c* and the
upper asymptote parameter *d*. We assume that the prior mean of
*c* is about 0.1 and use an inverse-gamma prior with shape 10 and
scale 1 for *c*. We restrict *d* to the (0.8,1) range and
assign it a Uniform(0.8,1) prior.

*c* ∼ InvGamma(10,1)

*d* ∼ Uniform(0.8,1)

The mean hyperparameters, *μ _{a}* and

*μ*, and variance hyperparameters,

_{b}*σ*and

^{2}_{a}*σ*, require informative prior specifications. We assume that the means are centered at 0 with a variation of 0.1. To lower the variability of the

^{2}_{b}*ln*(

*a*) and

_{i}*b*parameters, we use an inverse-gamma prior with shape 10 and scale 1 for the variance parameters so that their prior means are about 0.1.

_{i}
*μ _{a}*,

*μ*∼

_{b}*N*(0,0.1)

*σ*,

^{2}_{a}*σ*∼ InvGamma(10,1)

^{2}_{b}### Using bayesmh

We specify the model above using **bayesmh** as follows:

The first two lines model the probability of success of a Bernoulli
outcome as a nonlinear function of model parameters. Subject-specific
parameters **{subj:}** and item-specific parameters
**{discrim:}** and **{diff:}** are included as "random effects"
in the model by using the corresponding **redefine()** options
(available in Stata 14.1) for computational efficiency. The priors for
model parameters are specified in the corresponding **prior()**
options. We place model parameters in separate blocks to improve the
simulation efficiency and provide more sensible initial values for
some of the parameters. Here we treat the abilities **{subj:i.id}**
as nuisance parameters and exclude them from the final results. We use
a longer burn-in period, **burnin(5000)**, to allow for longer
adaptation of the MCMC sampler, which is needed given the large number
of parameters in the model.

**bayesmh** produces the following results:

The upper asymptote parameter **d** is estimated to be 0.96 with a
95% credible interval of [0.94, 0.99]. The estimate is fairly close to
one, so a simpler 3PL model may be sufficient for these data.

### Comparing models

More formally, we can compare deviance information criteria (DIC) of
the 4PL and the 3PL (with *d* = 1) models.

The DIC of the 3PL model (not shown here) is 8049.4. The 4PL model has a lower DIC value, 8032.7, which suggests that the 4PL model provides a better fit. However, we should not rely solely on the DIC values to make our final model selection. A practitioner may still prefer the simpler 3PL model given that the upper asymptote estimate is close to one.

For more examples of Bayesian binary IRT models and details about
model specifications, see our blog entry: "Bayesian binary item
response theory models using **bayesmh**.

—Nikolay Balov

Senior Statistician and Software Developer, StataCorp

—Yulia Marchenko

Executive Director of Statistics, StataCorp

### References

De Boeck, P., and M. Wilson, ed. 2004. *Explanatory Item Response
Models: A Generalized Linear and Nonlinear Approach.* New York:
Springer.

Fox, J.-P. 2010. *Bayesian Item Response Modeling: Theory and
Applications.* New York: Springer.

Kim, J.-S., and D. M. Bolt. 2007. Estimating item response theory
models using Markov chain Monte Carlo methods. *Educational
Measurement: Issues and Practice* 26: 38–51.

Loken, E., and K. L. Rulison. 2010. Estimation of a four-parameter
item response theory model. *British Journal of Mathematical and
Statistical Psychology* 63: 509–525.