Search
>> Home >> Resources & support >> FAQs >> Logistic regression with grouped data

### How can I do logistic regression or multinomial logistic regression with grouped data?

 Title Logistic regression with grouped data Author William Sribney, StataCorp Date June 1998; updated January 2000; minor revisions May 2005

The best way to do this is to first put your data in “long” form and then use frequency weights (fweights) with the logistic, logit, or mlogit command.

For logistic regression, one can also use the blogit command (documented in the [R] glogit entry of the manual), but it is better to use logistic or logit with frequency weights for several reasons.

It is easier to explain with an example. First, consider the following binary-outcome data:

 . list

cases   total   x1   x2

1.     23     123    0    0
2.     12     234    0    1
3.     56     248    1    0
4.     81     390    1    1



To use logistic and logit with fweights, the data need to be in the long form:

 . list

w   y   x1   x2

1.  100   0    0    0
2.   23   1    0    0
3.  222   0    0    1
4.   12   1    0    1
5.  192   0    1    0

6.   56   1    1    0
7.  309   0    1    1
8.   81   1    1    1



You can then run commands such as

 . logistic y x1 x2 [fw=w]


To use blogit with the original data, you issue the command

 . blogit cases total x1 x2


This command gives the same answer as the logistic command with the rearranged data.

However, logistic has advantages in that one can run other commands afterward like estat gof. The epitab family of commands (see [ST] epitab) also wants data in this long form.

As a general rule, Stata wants data in this long form, so it is best to transform to this long form right away and then work with Stata.

To do the transformation to long form, use the reshape command.

Here is how you do it for this example:

 . gen w0 = total - cases       /* w0 = counts of controls */
. rename cases w1              /* w1 = counts of cases */
. gen id = _n                  /* reshape needs a group id variable */
. reshape long w, i(id) j(y)


The categories (i.e., the suffixes of w) will appear in the variable y. The frequency weights will be given in the new variable w.

Then one can do

 . logistic y x1 x2 [fw=w]
. mlogit y <covariates> [fw=w]

etc....