Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Biprobit and clustering standard errors |

Date |
Thu, 8 Sep 2011 09:45:23 +0200 |

On Wed, Sep 7, 2011 at 6:33 PM, Lina C wrote: > Thank you. I have around 90 clusters. The thing is that biprobit uses > the whole sum of "X" of both probits to comput the covariance matriz. A model uses information from the data to compute a coefficient, it does not matter whether or not these coefficients are associated with the same variable. So it is the number of coefficients that counts, not the number of unique variables. With -biprobit- you are estimating both probits simultaneously, this is great as it allows you to study and/or control for how these two processes interact, but the price you need to pay for that is that you are estimating more coefficients in one model. I would be very suspicious of such models with more than 45 variables in each equation. My rule of thumb is that as an absolute minimum I require 10 observations, per coefficient, that is, 20 observations if I want to add a variable to both probits. In relatively complicated models like -biprobit- I would only start to get some confidence in the results if I had a 100 observations per coefficient. The number of observations gets a bit more complicated when the observations are clustered. The fact that you want clustered standard errors means that you believe that the observations within the same cluster are not independent bits of information. So if you know something about one unit in a cluster, you also have some information about the other units in that cluster. So collecting information from another unit in that cluster will not add the same amount of information as collecting information form a unit in another cluster. So when using clustered standard errors I look at both the number of observations (which is too optimistic) and the number of clusters (which is too pessimistic) to determine when I feel confident about the model. For a model like this and with this number of clusters I would probably not use more than 5 variables. To quote John Tukey (1986, pp. 74-75): "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." Hope this helps, Maarten John Tukey (1986), "Sunset salvo". The American Statistician 40(1): 72--76. -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Biprobit and clustering standard errors***From:*Lina C <linacs81@gmail.com>

**Re: st: Biprobit and clustering standard errors***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: Biprobit and clustering standard errors***From:*Lina C <linacs81@gmail.com>

- Prev by Date:
**Re: st: Re: Listing user-written ado files called by program** - Next by Date:
**Re: st: Re: Listing user-written ado files called by program** - Previous by thread:
**Re: st: Biprobit and clustering standard errors** - Next by thread:
**Re: st: Panel cointegration** - Index(es):