|Title||Logistic regression with aggregated data|
|Author||William Sribney, StataCorp|
One way to do this is to first rearrange your data so you can use frequency weights (fweights) with the logistic, logit, or mlogit command.
For binary outcomes, one can also use glm with family(binomialvarnameN) and link(logit), where varnameN is a variable that stores the total number of trials for each observation. However, rearranging the data for use with frequency weights also covers the more general case of multinomial outcomes.
It is easier to explain with an example. First, consider the following binary-outcome data:
|cases total x1 x2|
|1.||23 123 0 0|
|2.||12 234 0 1|
|3.||56 248 1 0|
|4.||81 390 1 1|
To use logistic and logit with fweights, the data need to be rearranged such that we have one observation per response category:
. list , sep(0)
|w y x1 x2|
|1.||100 0 0 0|
|2.||23 1 0 0|
|3.||222 0 0 1|
|4.||12 1 0 1|
|5.||192 0 1 0|
|6.||56 1 1 0|
|7.||309 0 1 1|
|8.||81 1 1 1|
You can then run commands such as
. logit y x1 x2 [fw=w]
We could fit the same model using the glm command:
. glm cases x1 x2, family(binomial total) link(logit)
This glm specification gives the same answer as the logit command with the rearranged data. However, logit or logistic have advantages in that one can run other commands afterward like estat gof.
To rearrange the data, you can use the reshape command.
Here is how you do it for this example:
. gen w0 = total - cases /* w0 = counts of failures */ . rename cases w1 /* w1 = counts of successes */ . gen id = _n /* reshape needs a group id variable */ . reshape long w, i(id) j(y)
The categories (i.e., the suffixes of w) will appear in the variable y. The frequency weights will be given in the new variable w.
Then one can do
. logit y x1 x2 [fw=w] . mlogit y <covariates> [fw=w] etc....