Stata: Data Analysis and Statistical Software
   >> Home >> Resources & support >> FAQs >> Logistic regression with grouped data

How can I do logistic regression or multinomial logistic regression with grouped data?

Title   Logistic regression with grouped data
Author William Sribney, StataCorp
Date June 1998; updated January 2000; minor revisions May 2005

The best way to do this is to first put your data in “long” form and then use frequency weights (fweights) with the logistic, logit, or mlogit command.

For logistic regression, one can also use the blogit command (documented in the [R] glogit entry of the manual), but it is better to use logistic or logit with frequency weights for several reasons.

It is easier to explain with an example. First, consider the following binary-outcome data:

 . list

      +-------------------------+
      | cases   total   x1   x2 |
      |-------------------------|
   1. |    23     123    0    0 |
   2. |    12     234    0    1 |
   3. |    56     248    1    0 |
   4. |    81     390    1    1 |
      +-------------------------+

To use logistic and logit with fweights, the data need to be in the long form:

 . list

      +-------------------+
      |   w   y   x1   x2 |
      |-------------------|
   1. | 100   0    0    0 |
   2. |  23   1    0    0 |
   3. | 222   0    0    1 |
   4. |  12   1    0    1 |
   5. | 192   0    1    0 |
      |-------------------|
   6. |  56   1    1    0 |
   7. | 309   0    1    1 |
   8. |  81   1    1    1 |
      +-------------------+

You can then run commands such as

 . logistic y x1 x2 [fw=w]
 

To use blogit with the original data, you issue the command

 . blogit cases total x1 x2

This command gives the same answer as the logistic command with the rearranged data.

However, logistic has advantages in that one can run other commands afterward like estat gof. The epitab family of commands (see [ST] epitab) also wants data in this long form.

As a general rule, Stata wants data in this long form, so it is best to transform to this long form right away and then work with Stata.

To do the transformation to long form, use the reshape command.

Here is how you do it for this example:

 . gen w0 = total - cases       /* w0 = counts of controls */
 . rename cases w1              /* w1 = counts of cases */
 . gen id = _n                  /* reshape needs a group id variable */
 . reshape long w, i(id) j(y)

The categories (i.e., the suffixes of w) will appear in the variable y. The frequency weights will be given in the new variable w.

Then one can do

 . logistic y x1 x2 [fw=w]
 . mlogit y <covariates> [fw=w]
 
 etc....
 
Bookmark and Share 
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
Like us on Facebook Follow us on Twitter Follow us on LinkedIn Google+ Watch us on YouTube
Follow us
© Copyright 1996–2013 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index   |   View mobile site