Title | The Rasch model in Stata | |

Author | Jeroen Weesie, Department of Sociology/ICS, Utrecht University |

Below we show how to fit a Rasch model using conditional maximum
likelihood in Stata. The **gsem** command can also be used to fit a Rasch
model using maximum likelihood, see [SEM] example 28g. A related model,
the one parameter logistic item response theory model can be fit using
**irt 1pl** see [IRT] irt 1pl.

The Rasch model is one of the dominant models for binary items (e.g., success/failure on math problems) in psychometrics. Let y(ij) be the binary response, where i=1,...,n, n = #subjects, j=1, ..., m, m = #items. (Sometimes “unbalanced designs” are used in which subjects respond to only a subset of the items. The description below applies without modification.) The Rasch model can be written as a logit-linear model:

logit Pr(y(ij)=1 | eta(i)) = eta(i) − theta(j)

Here eta(i) can be interpreted as a person’s ability parameter and theta(j) as an item-difficulty parameter. Moreover, it is assumed that, conditional on eta(i), the y(i*) are independent (“local independence”). Actually, Rasch, a Danish statistician, gave an axiomatic derivation of the model in the 1960s, in which, next to local independence, the main characterizing properties were that the { y(+j) , y(i+) } form a sufficient statistic for { eta, theta }.

**Technical note:**
When treating the etas and thetas as parameters (fixed effects), it has
long been known that the maximum likelihood estimators are
inconsistent in the standard asymptotic setup (n→infinity, m fixed).
This may become somewhat intuitive if we observe that with each
extra subject we have m extra observations but also one extra eta
parameter to estimate. Thus, intuitively, the number of observations
and hence, the amount of information per parameter, stay almost
fixed with increasing n.

A famous example of inconsistent maximum likelihood estimators with a
similar structure is the following. Let y(ij) ~
Normal(mu(i),v^{2}), i=1, ..., n, j=1, 2. The mu(i) parameters
and v^{2} are to be estimated. The maximum likelihood
estimator for v^{2} is

sum_{{i,j}}(y(ij) − y(i.))^{2}/ (2n)

with expected value and also with probability limit
(1/2)v^{2}, not v^{2}.

In the 1980s, Andersen showed the conditional maximum likelihood (CML)
estimator of theta(j), where conditioning occurs on the subject’s
scores y(i+), is efficient and is actually asymptotically normal distributed,
with all the nice properties of ml-estimators in regular situations. For
instance, a conditional likelihood-ratio test has properties analogous to
those of the standard likelihood-ratio test. While other estimators have been
investigated, the CML estimator is the one most widely used for the
fixed-effects case—the etas are treated as parameters (Fischer and
Molenaar 1995). This model can be fit using the **clogit**
command as shown below.

It may come as only a little surprise that psychometricians also studied a
Gaussian random-effects estimator for the thetas. Starting in Stata 14, a
mathematically equivalent model can be fit using
**irt 1pl**. Starting in Stata 13, a Rasch model can be fit using **gsem**; see **[SEM] example 28g**. Prior to Stata 13, a Rasch model could be fit by the random-effects panel estimator, computed by the
**xtlogit, re**
command, as shown below.

Consider the most probable case; you have a dataset in which the cases
refer to subjects, and the responses on m items are stored in m dichotomous
variables item_1, ..., item_m. You can obtain CLM estimates of the theta
parameters of the Rasch model using the conditional logit fixed-effects
estimator in
**clogit**
(**xtlogit, fe**
is equivalent to typing **clogit**).
This command requires all responses be stored in separate observations, while a
“group” variable is used to identify the observations that belong to the same
subject. This can be accomplished with the
**reshape** command.
Finally, you can describe the Rasch model as a “clogit model”
with m covariates x(ijk), k=1, ..., m, so that for all i, x(ijk) = −1
if j=k, and 0 otherwise. The regression coefficient of x(..k) is just
theta(k).

In a conditional logit model, effects of variables that are constant within groups (subjects) are not identified. Since the x(..k) variables are a complete classification, i.e., they sum to −1 in each record, the thetas are not identified. If all x’s are included in the model, Stata deals with this problem automatically by removing one of the x variables. You can also withdraw one of the x variables from the set of predictor variables. The associated theta parameter could be interpreted as being fixed at 0.

The outline for estimating the CML estimates of the theta parameters of the Rasch model follows.

(1) If necessary, transform the data into long format.

(2) Create explanatory variables for the thetas.

(3) Estimate the theta parameters.

Consider data on 120 subjects who responded to 10 math problems, coded as 1 (correct) and 0 (incorrect). We want to know whether the 10 math problems involve a one-dimensional scale so that the items can be ordered with respect to difficulty, irrespective of the subjects’ abilities.

storage display value |

variable name type format label variable label |

math1 byte %9.0g Correct math item 1 |

math2 byte %9.0g Correct math item 2 |

math3 byte %9.0g Correct math item 3 |

math4 byte %9.0g Correct math item 4 |

math5 byte %9.0g Correct math item 5 |

math6 byte %9.0g Correct math item 6 |

math7 byte %9.0g Correct math item 7 |

math8 byte %9.0g Correct math item 8 |

math9 byte %9.0g Correct math item 9 |

math10 byte %9.0g Correct math item 10 |

subj_id int %9.0g |

First, to get some feeling for the data, we describe the items

Variable | Obs Mean Std. Dev. Min Max | |

math1 | 120 .8083333 .3952626 0 1 | |

math2 | 120 .775 .4193332 0 1 | |

math3 | 120 .6833333 .4671266 0 1 | |

math4 | 120 .55 .4995797 0 1 | |

math5 | 120 .5333333 .5009794 0 1 | |

math6 | 120 .45 .4995797 0 1 | |

math7 | 120 .425 .4964157 0 1 | |

math8 | 120 .3 .460179 0 1 | |

math9 | 120 .2583333 .4395535 0 1 | |

math10 | 120 .1666667 .3742406 0 1 |

They are already labeled so that the proportion of students that answer the item correctly decreases. The subjects are aptly described via the total number of correct answers.

score | Freq. Percent Cum. | |

0 | 3 2.50 2.50 | |

1 | 5 4.17 6.67 | |

2 | 4 3.33 10.00 | |

3 | 16 13.33 23.33 | |

4 | 21 17.50 40.83 | |

5 | 24 20.00 60.83 | |

6 | 22 18.33 79.17 | |

7 | 13 10.83 90.00 | |

8 | 4 3.33 93.33 | |

9 | 6 5.00 98.33 | |

10 | 2 1.67 100.00 | |

Total | 120 100.00 |

To fit the Rasch model, we first have to reshape the data.

Data wide -> long |

Number of obs. 120 -> 1200 |

Number of variables 12 -> 4 |

j variable (10 values) -> item |

xij variables: |

math1 math2 ... math10 -> math |

We now have to generate the predictor variables for the thetas:

To illustrate what the data now look like, we
**list** the cases.

* set format to compress the output of **list**
**. format math item Th* %4.0f
*** sort within subj_id on the identifier item of the math problem.
**. sort subj_id item
*** invoke **list** with options that improve readability.
**. list math item Th* if sub==1, nodisplay noobs nolabel **

**. list math item Th* if sub==2, nodisplay noobs nolabel**

math item Th1 Th2 Th3 Th4 Th5 Th6 Th7 Th8 Th9 Th10 | ||

1 1 -1 0 0 0 0 0 0 0 0 0 | ||

1 2 0 -1 0 0 0 0 0 0 0 0 | ||

1 3 0 0 -1 0 0 0 0 0 0 0 | ||

0 4 0 0 0 -1 0 0 0 0 0 0 | ||

0 5 0 0 0 0 -1 0 0 0 0 0 | ||

0 6 0 0 0 0 0 -1 0 0 0 0 | ||

0 7 0 0 0 0 0 0 -1 0 0 0 | ||

0 8 0 0 0 0 0 0 0 -1 0 0 | ||

0 9 0 0 0 0 0 0 0 0 -1 0 | ||

0 10 0 0 0 0 0 0 0 0 0 -1 | ||

math item Th1 Th2 Th3 Th4 Th5 Th6 Th7 Th8 Th9 Th10 | ||

1 1 -1 0 0 0 0 0 0 0 0 0 | ||

1 2 0 -1 0 0 0 0 0 0 0 0 | ||

0 3 0 0 -1 0 0 0 0 0 0 0 | ||

0 4 0 0 0 -1 0 0 0 0 0 0 | ||

1 5 0 0 0 0 -1 0 0 0 0 0 | ||

1 6 0 0 0 0 0 -1 0 0 0 0 | ||

1 7 0 0 0 0 0 0 -1 0 0 0 | ||

0 8 0 0 0 0 0 0 0 -1 0 0 | ||

0 9 0 0 0 0 0 0 0 0 -1 0 | ||

0 10 0 0 0 0 0 0 0 0 0 -1 | ||

Thus, we see that the subject with **subj_id**=1 has 10 records
associated with his response: the score for each item is in variable
**math**. The remaining 10 variables, **Th1**–**Th10**,
which are included in the output, are the independent variables. The
**Th***#*s are nearly everywhere 0. We see that the response on the
first item (**item**>==1) is modeled only with the predictor variable
**Th1**, the response on the second item (**item**>==2) with the
second predictor **Th2**, etc.

We are now ready to request the CML estimates of the theta parameters of the
Rasch model using the
**clogit** command:

math | Coef. Std. Err. z P>|z| [95% Conf. Interval] | |

Th2 | .241266 .3481052 0.69 0.488 -.4410077 .9235397 | |

Th3 | .7921615 .3318989 2.39 0.017 .1416515 1.442672 | |

Th4 | 1.450772 .3241392 4.48 0.000 .8154703 2.086073 | |

Th5 | 1.528206 .3239802 4.72 0.000 .8932166 2.163195 | |

Th6 | 1.913438 .3253857 5.88 0.000 1.275694 2.551183 | |

Th7 | 2.030632 .3265377 6.22 0.000 1.390629 2.670634 | |

Th8 | 2.662249 .3389274 7.85 0.000 1.997964 3.326535 | |

Th9 | 2.904665 .3467128 8.38 0.000 2.225121 3.58421 | |

Th10 | 3.55384 .3771324 9.42 0.000 2.814674 4.293006 | |

We conclude that the first item (**Th1**) with associated Theta(1)=0 is
the easiest item, to which the most subjects responded correctly. Item 10 is
the most difficult item, to which the fewest subjects responded correctly.
**clogit** has dropped five subjects from the analysis. These subjects
responded either to all items correctly or to all items incorrectly; in a
conditional likelihood, these subjects carry no information about the
difficulty of the items.

Andersen has suggested a specification test, namely that the difficulty
parameters are the same for the “good” subjects and the
“poor” subjects, distinguished via their total score. While
Andersen subsequently employed a (conditional) likelihood-ratio test, we
will proceed by illustrating the
**hausman** command. The
**hausman** test can be used to compare the theta estimates obtained from
the full sample with the theta estimates obtained from the
“good” or “bad” students.

---- Coefficients ---- | ||

(b) (B) (b-B) sqrt(diag(V_b-V_B)) | ||

LESS RASCH Difference S.E. | ||

Th2 | .1507434 .241266 -.0905226 .172672 | |

Th3 | 1.024031 .7921615 .2318693 .1606061 | |

Th4 | 1.358351 1.450772 -.0924202 .1760475 | |

Th5 | 1.414312 1.528206 -.1138945 .1773457 | |

Th6 | 1.822396 1.913438 -.0910427 .1907797 | |

Th7 | 2.014378 2.030632 -.016254 .2016512 | |

Th8 | 3.414959 2.662249 .7527093 .3793093 | |

Th9 | 3.102924 2.904665 .1982584 .3122054 | |

Th10 | 3.414959 3.55384 -.1388815 .341348 | |

We conclude that the difficulty of items does not appear to be different for “poor” subjects. Similarly,

---- Coefficients ---- | ||

(b) (B) (b-B) sqrt(diag(V_b-V_B)) | ||

LESS RASCH Difference S.E. | ||

Th2 | .146233 .241266 -.0950331 .4148311 | |

Th3 | .5045323 .7921615 -.2876292 .3871818 | |

Th4 | 1.307562 1.450772 -.1432097 .3401682 | |

Th5 | 1.241154 1.528206 -.2870522 .343086 | |

Th6 | 1.907459 1.913438 -.0059791 .3258093 | |

Th7 | 1.73504 2.030632 -.2955911 .3263267 | |

Th8 | 2.250694 2.662249 -.4115551 .3131327 | |

Th9 | 2.610758 2.904665 -.2939078 .3127668 | |

Th10 | 3.488208 3.55384 -.0656323 .3271359 | |

and again we find no significant differences in the item difficulties.

As mentioned above, we can also obtain the estimates for the thetas if we assume that
the etas are random effects. The command **xtlogit, re** requires the same data
organization as **clogit**.

math | Coef. Std. Err. z P>|z| [95% Conf. Interval] | |

Th1 | -1.66906 .2640861 -6.32 0.000 -2.18666 -1.151461 | |

Th2 | -1.43923 .2510219 -5.73 0.000 -1.931224 -.9472359 | |

Th3 | -.9007114 .2289035 -3.93 0.000 -1.349354 -.4520687 | |

Th4 | -.2356067 .2161719 -1.09 0.276 -.6592959 .1880825 | |

Th5 | -.1566429 .2156576 -0.73 0.468 -.579324 .2660382 | |

Th6 | .236555 .2161708 1.09 0.274 -.1871319 .6602419 | |

Th7 | .355942 .2173421 1.64 0.101 -.0700406 .7819247 | |

Th8 | .9920495 .2318228 4.28 0.000 .5376852 1.446414 | |

Th9 | 1.231195 .2410623 5.11 0.000 .7587211 1.703668 | |

Th10 | 1.860508 .2766865 6.72 0.000 1.318212 2.402803 | |

/lnsig2u | -.1718483 .252817 -.6673605 .3236639 | |

sigma_u | .9176639 .1160005 .7162828 1.175663 | |

rho | .2038025 .0410239 .1349121 .2958407 | |

Here the order of difficulties of the items is not affected by moving from a fixed-effects to a random-effects estimator for the thetas.

It is important to carry out specification tests of the model to analyze whether the model fits the data. In the literature, such tests have been suggested and are indeed commonly applied by psychometricians. Some of these tests, as proposed by Andersen, are (conditional) likelihood-ratio tests for the hypotheses that the thetas are the same for groups of subjects formed by an exogenous variable (Do men and women behave the same way?) and by distinguishing groups via the total score: are the difficulty parameters the same for the “smart” subjects (those for whom y(i+) is relatively high) as those for the “dumb” subjects (those with a low total score for y(i+))? CLM’s condition is on y(i+), so it is OK to distinguish cases using y(i+). Above we have applied Hausman tests for this purpose.

Another purpose of a Rasch analysis is to estimate the subject parameter eta. In the fixed effects approach, the etas are commonly estimated by maximum likelihood conditional on the CLM theta-estimates. For the random-effects case, the etas are commonly estimated by posterior means.

- Fischer, H. G. and I. W. Molenaar. 1995.
*Rasch Models. Foundations, Recent Developments and Applications.*New York: Springer.