Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: data structure for clogit

From   Matt Roberts <>
Subject   st: data structure for clogit
Date   Fri, 3 Aug 2012 10:20:13 +0100

My Stata programming skills are limited so I am hoping that someone
can help me with me. I am conducting some research on government
coalitions and I need to create a dataset that contains every possible
government that could potentially form after each election. I also
need variables that contain information on the total number of seats
each potential government would hold in parliament and the ideological
range of each government. I have all of the raw data that would be
needed to create this dataset, I'm just unsure how to do it in Stata.
I could probably do it in Excel but my dataset will exceed the
permissable size of an Excel worksheet. Here is a simple example of
what I want to achieve:

I have the following data for a formation opportunity - the formation
opportunity will be the ID variable for the clogit model:

Party  Seats  Ideol
P1        40      7
P2        20      2
P3        10     11

Where 'Party' is the name of the individual parties in a parliament,
'Seats' is the share of seats each party has, and 'Ideol' is the
ideological position of each party. For the clogit model that I need
to use the data must be reconfigured to look like this:

ID   party_comb  tot_seats   Ideo_dist
1    P1                  40             0
1    P1, P2            60             5
1    P1, P3            50             4
1    P1, P2, P3      70             4
1    P2                  20             0
1    P2, P,3           30             9
1    P3                  10             0

In the above, 'party_comb' contains every possible government that
could form, 'tot_seats' is the total number of seats that each
government would hold - this comes from adding the relevant values
from the raw data outlined above. 'Ideo_dist' is the distance between
the the largest and smallest values of the 'Ideol' variable for the
relevant parties; so for the combination 'P1, P2' this is 7-2=5.

This is not a problem to hand code with only a few cases but I have
quite a lot of cases (200+) and in some the number of potential
governments will exceed 2,000. If it makes it any easier, the values
for the 'party_comb' variable could be divided into separate cells
across the row so that each cell represents one party.

Thanks for any help that you can provide with this.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index