Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Biprobit and clustering standard errors

From   Maarten Buis <>
Subject   Re: st: Biprobit and clustering standard errors
Date   Thu, 8 Sep 2011 09:45:23 +0200

On Wed, Sep 7, 2011 at 6:33 PM, Lina C wrote:
> Thank you. I have around 90 clusters. The thing is that biprobit uses
> the whole sum of "X" of both probits to comput the covariance matriz.

A model uses information from the data to compute a coefficient, it
does not matter whether or not these coefficients are associated with
the same variable. So it is the number of coefficients that counts,
not the number of unique variables. With -biprobit- you are estimating
both probits simultaneously, this is great as it allows you to study
and/or control for how these two processes interact, but the price you
need to pay for that is that you are estimating more coefficients in
one model.

I would be very suspicious of such models with more than 45 variables
in each equation. My rule of thumb is that as an absolute minimum I
require 10 observations, per coefficient, that is, 20 observations if
I want to add a variable to both probits. In relatively complicated
models like -biprobit- I would only start to get some confidence in
the results if I had a 100 observations per coefficient. The number of
observations gets a bit more complicated when the observations are
clustered. The fact that you want clustered standard errors means that
you believe that the observations within the same cluster are not
independent bits of information. So if you know something about one
unit in a cluster, you also have some information about the other
units in that cluster. So collecting information from another unit in
that cluster will not add the same amount of information as collecting
information form a unit in another cluster. So when using clustered
standard errors I look at both the number of observations (which is
too optimistic) and the number of clusters (which is too pessimistic)
to determine when I feel confident about the model. For a model like
this and with this number of clusters I would probably not use more
than 5 variables. To quote John Tukey (1986, pp. 74-75): "The
combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of

Hope this helps,

John Tukey (1986), "Sunset salvo". The American Statistician 40(1): 72--76.

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index