Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Modelling extremely rare events (binary)


From   Markus Eberhardt <markus.eberhardt@economics.ox.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Modelling extremely rare events (binary)
Date   Tue, 14 Jun 2011 09:57:25 +0100

Thanks for the comments, Maarten! In particular the suggestion not to
use a large number of covariates is I think very useful. Furthermore,
your comments indicate what I had been concerned about, namely that a
'mechanistic' approach using 'standard' methods is indeed very
misleading. The regression analysis we're aiming for in this paper is
just one part of the story, with a lot of emphasis on descriptives and
what you wrote really reinforced that view.

We are (credibly) able to focus on a subsample where the figures are
not quite as stark (5% and 1%) but again your remarks will make sure
we're not getting carried away with the (regression) empirics.

Thanks again.
m

On 14 June 2011 09:50, Maarten Buis <maartenlbuis@gmail.com> wrote:
> On Tue, Jun 14, 2011 at 10:05 AM, Markus Eberhardt wrote:
>> I have an empirical problem where for a very large dataset (panel,
>> around 20,000 panel members with over 60,000 observations) I have two
>> binary outcome variables A and B. The occurrence of either is
>> extremely rare: only about 1.5% and 0.1% of observations for A and B
>> respectively.
>
> I would say that the main problem is that you loose a lot of
> statistical power, you have respectively about 300 and 20 occurrences.
> Those are the observations in your dataset that contain information on
> whether these events occurred. Especially in the latter case I would
> not use more than 1 at most 2 explanatory variables and I would forget
> about combining the events. The necessary information just is not
> present in your data, so no amount of modeling can get it out.
>
> Research on rare events (e.g. rare diseases) typically does not use a
> random sample from the population, exactly because of this problem.
> They would need to collect such a huge dataset to get a useful number
> of events that it is just too expensive to collect, or too
> impractical. Instead they tend to do what is often called a
> case-control study. Unfortunately, this does not help you, as your
> data was apparently already collected.
>
> I hope that this is not too depressing,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
>
> http://www.maartenbuis.nl
> --------------------------
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index