Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Biprobit and clustering standard errors

 From Maarten Buis To statalist@hsphsun2.harvard.edu Subject Re: st: Biprobit and clustering standard errors Date Thu, 8 Sep 2011 09:45:23 +0200

```On Wed, Sep 7, 2011 at 6:33 PM, Lina C wrote:
> Thank you. I have around 90 clusters. The thing is that biprobit uses
> the whole sum of "X" of both probits to comput the covariance matriz.

A model uses information from the data to compute a coefficient, it
does not matter whether or not these coefficients are associated with
the same variable. So it is the number of coefficients that counts,
not the number of unique variables. With -biprobit- you are estimating
both probits simultaneously, this is great as it allows you to study
and/or control for how these two processes interact, but the price you
need to pay for that is that you are estimating more coefficients in
one model.

I would be very suspicious of such models with more than 45 variables
in each equation. My rule of thumb is that as an absolute minimum I
require 10 observations, per coefficient, that is, 20 observations if
I want to add a variable to both probits. In relatively complicated
models like -biprobit- I would only start to get some confidence in
the results if I had a 100 observations per coefficient. The number of
observations gets a bit more complicated when the observations are
clustered. The fact that you want clustered standard errors means that
you believe that the observations within the same cluster are not
independent bits of information. So if you know something about one
unit in a cluster, you also have some information about the other
units in that cluster. So collecting information from another unit in
that cluster will not add the same amount of information as collecting
information form a unit in another cluster. So when using clustered
standard errors I look at both the number of observations (which is
too optimistic) and the number of clusters (which is too pessimistic)
to determine when I feel confident about the model. For a model like
this and with this number of clusters I would probably not use more
than 5 variables. To quote John Tukey (1986, pp. 74-75): "The
combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data."

Hope this helps,
Maarten

John Tukey (1986), "Sunset salvo". The American Statistician 40(1): 72--76.

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```