Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Modelling extremely rare events (binary)

From   Maarten Buis <>
Subject   Re: st: Modelling extremely rare events (binary)
Date   Tue, 14 Jun 2011 10:50:38 +0200

On Tue, Jun 14, 2011 at 10:05 AM, Markus Eberhardt wrote:
> I have an empirical problem where for a very large dataset (panel,
> around 20,000 panel members with over 60,000 observations) I have two
> binary outcome variables A and B. The occurrence of either is
> extremely rare: only about 1.5% and 0.1% of observations for A and B
> respectively.

I would say that the main problem is that you loose a lot of
statistical power, you have respectively about 300 and 20 occurrences.
Those are the observations in your dataset that contain information on
whether these events occurred. Especially in the latter case I would
not use more than 1 at most 2 explanatory variables and I would forget
about combining the events. The necessary information just is not
present in your data, so no amount of modeling can get it out.

Research on rare events (e.g. rare diseases) typically does not use a
random sample from the population, exactly because of this problem.
They would need to collect such a huge dataset to get a useful number
of events that it is just too expensive to collect, or too
impractical. Instead they tend to do what is often called a
case-control study. Unfortunately, this does not help you, as your
data was apparently already collected.

I hope that this is not too depressing,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index