How can I do logistic regression or multinomial logistic regression with grouped data?
| Title |
|
Logistic regression with grouped data |
| Author |
William Sribney, StataCorp |
| Date |
June 1998; updated January 2000; minor revisions May 2005 |
The best way to do this is to first put your data in “long” form
and then use frequency weights
(fweights) with the
logistic,
logit, or
mlogit command.
For logistic regression, one can also use the
blogit command (documented in
the [R] glogit
entry of the manual), but it is better to use logistic or
logit with frequency weights for several reasons.
It is easier to explain with an example. First, consider the following
binary-outcome data:
. list
+-------------------------+
| cases total x1 x2 |
|-------------------------|
1. | 23 123 0 0 |
2. | 12 234 0 1 |
3. | 56 248 1 0 |
4. | 81 390 1 1 |
+-------------------------+
To use logistic and logit with fweights, the data need
to be in the long form:
. list
+-------------------+
| w y x1 x2 |
|-------------------|
1. | 100 0 0 0 |
2. | 23 1 0 0 |
3. | 222 0 0 1 |
4. | 12 1 0 1 |
5. | 192 0 1 0 |
|-------------------|
6. | 56 1 1 0 |
7. | 309 0 1 1 |
8. | 81 1 1 1 |
+-------------------+
You can then run commands such as
. logistic y x1 x2 [fw=w]
To use blogit with the original data, you issue the command
. blogit cases total x1 x2
This command gives the same answer as the logistic command with the
rearranged data.
However, logistic has advantages in that one can run other commands
afterward like
estat
gof. The
epitab family of
commands (see [ST] epitab) also wants data in this long form.
As a general rule, Stata wants data in this long form, so it is best to
transform to this long form right away and then work with Stata.
To do the transformation to long form, use the
reshape command.
Here is how you do it for this example:
. gen w0 = total - cases /* w0 = counts of controls */
. rename cases w1 /* w1 = counts of cases */
. gen id = _n /* reshape needs a group id variable */
. reshape long w, i(id) j(y)
The categories (i.e., the suffixes of w) will appear in the variable
y. The frequency weights will be given in the new variable w.
Then one can do
. logistic y x1 x2 [fw=w]
. mlogit y <covariates> [fw=w]
etc....
|