Re: st: Modelling extremely rare events (binary)

 From Markus Eberhardt
Subject Re: st: Modelling extremely rare events (binary)
Date Tue, 14 Jun 2011 10:37:25 +0100

```Thanks, Abhimanyu.
This seems quite relevant, although I'll keep Maarten's caution in mind.
Markus Eberhardt
ESRC Post-doctoral Research Fellow, Centre for the Study of African
Economies, Department of Economics, University of Oxford
Stipendiary Lecturer, St Catherine's College, Oxford

email: markus.eberhardt@economics.ox.ac.uk
mail: Centre for the Study of African Economies, Department of
Economics, Manor Rd, Oxford OX1 3UQ, England

On 14 June 2011 10:28, Abhimanyu Arora <abhimanyu.arora1987@gmail.com> wrote:
> Perhaps you could have a look at Gary Kings's -relogit-?
> On Tue, Jun 14, 2011 at 10:05 AM, Markus Eberhardt
> <markus.eberhardt@economics.ox.ac.uk> wrote:
>> Hello everybody
>>
>> I have an empirical problem where for a very large dataset (panel,
>> around 20,000 panel members with over 60,000 observations) I have two
>> binary outcome variables A and B. The occurrence of either is
>> extremely rare: only about 1.5% and 0.1% of observations for A and B
>> respectively. I am for the time being treating this as a pooled panel,
>> so not accounting for any fixed effects at the panel member level. My
>> empirical model is made up of continuous and binary variables. In the
>> logit and probit I am estimating A and B separately, for biprobit
>> jointly, for mlogit I have four categories (0, A occurrs, B occurrs,
>> both occurr). Ideally the analysis does account for the jointess of
>> the decision as in the biprobit and mlogit approaches.
>>
>> Here are my questions:
>> (1) DOES THIS AT ALL MAKE SENSE? Having estimated logit, probit,
>> bivariate probit and multinomial logit I am concerned about the
>> viability of what I am doing to this data: given the minute share of
>> actual events occurring (1s, rather than 0s) is it at all possible
>> that a logit-type model could tell me anything meaningful? So far I am
>> getting interpretable empirical results, but it was put to me that
>> these were entirely unreliable (or even spurious) given the extreme
>> rarety of the event. Note that there are strong priors (from the
>> descriptive analysis) that a certain characteristic (binary) drives
>> the outcomes, so I imagine that a fixed effect and/or an interaction
>> of this binary characteristic with other (continuous) RHS variables
>> may provide an intuitive 'fit', but I am unsure whether this is
>> empirically satisfied.
>> (2) USEFUL DIAGNOSTICS? My diagnostics for the model(s) are hampered
>> by the fact that it's difficult to get a handle on what constitutes a
>> substantial deviation for the predicted from the observed outcomes.
>> Apart from -fitstat- type diagnostics, are there any other things I
>> could do to chose between rival models and/or to convince myself that
>> what I'm doing is at all meaningful in this challenging empirical
>> case?
>> (3) ALTERNATIVE EMPIRICAL MODELS? Are there any other empirical
>> specifications that are better suited to fit this data? I tried to
>> search for extremely rare events such as earthquakes, but couldn't get
>> much out of it.
>> (4) PANEL ELEMENT? Possibly a bridge too far, but would there be any
>> option to get the panel element of the data to have a bearing on the
>> empirics.
```