Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How low can the percentage of uncensored cases be in heckprob?


From   Steven Samuels <sjhsamuels@earthlink.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How low can the percentage of uncensored cases be in heckprob?
Date   Tue, 11 Nov 2008 16:17:48 -0500

I am not familiar with -heckprob-, but I doubt if the *percent* of uncensored observations matters much.

-heckprob- fits two probit models. I know of results related to Margaret's question only for logit models. For a single logistic regression model, the relevant sample size is the smaller of the number of events or non-events. Peduzzi et al. (1996) showed that the ratio of this number to the number of predictors should be at least 15:1 to avoid bias from over-fitting.

-Steve

Refs:
Peduzzi PN, Concato J, Holford TR, Feinstein AR. (1995) The importance of events per independent variable in multivariable analysis, II: accuracy and precision of regression estimates. J Clin Epidemiol; 48: 1503–10.

Peduzzi PN, Concato J, Kemper E, Holford TR, Feinstein AR. (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol; 49: 1373–9.

M Babyak. (2004) What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosomatic Medicine 66:411-421. Full text:
http://www.psychosomaticmedicine.org/cgi/content-nw/full/66/3/411/




On Nov 11, 2008, at 12:15 PM, Maarten buis wrote:

--- "Tyler, Margaret C D" <margaret-tyler@uiowa.edu> wrote:
In the example in the Stata reference -H heckprob, there are 95 total
and 59 uncensored observations, so 62% are uncensored. In my own
situation I have only about 19% uncensored. Is it still appropriate
to use heckprob for my analysis? I have run the equations and gotten
what seem to be valid results. rho is non-significant.

You are obviously pushing your luck with that many censored cases. It
is no longer very popular to make statements like you need at least N
observation or p% uncensored cases for technique t to be appropriate
(whatever appropriate may mean). So I don't think you will get the
answer you are looking for. However, what you can do is run some
simulations and see how well (or bad) your estimator behaves with a
small number of uncensored cases. At the last Summer North American
Stata Users' Group meeting I gave a talk on using Stata for doing this
type of simulations, you can get the materials from:
http://ideas.repec.org/p/boc/nsug08/14.html

Hope this helps,
Maarten


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index