[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Henrik Andersson" <henrik.andersson@vti.se> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: ml interval data with point mass at zero |

Date |
Thu, 8 May 2008 17:44:23 +0200 |

Hi, I have estimated a survival function for interval data. The log-likelihood function can be written as LnL = YY*ln[F(bid2Y)]+NN*ln[F(bid2N)]+YN*ln[F(bid2Y)-F(bid1)]+NY*ln[F(bid1)-F( bid2N)] where YY, NN, YN, and NY are indicator variables coded as one if my variable of interest falls within that interval, and zero otherwise. F(.) are my CDFs. My model works fine and is defined as follows. ** Specify log-likelihood function ** capture program drop double_cv program double_cv version 9.2 args lnf bid xb qui replace `lnf' = ln(norm($ML_y2*`bid'+`xb')) if $ML_y4 == 1 qui replace `lnf' = ln(norm(-($ML_y3*`bid'+`xb'))) if $ML_y5 == 1 qui replace `lnf' = ln(norm(-($ML_y2*`bid'+`xb')) - /// norm(-($ML_y1*`bid'+`xb'))) if $ML_y6 == 1 qui replace `lnf' = ln(norm(-($ML_y1*`bid'+`xb')) - /// norm(-($ML_y3*`bid'+`xb'))) if $ML_y7 == 1 end ** Estimate model ** ml model lf double_cv_norm_rev_spike1 (bid: bid1 bid2Y bid2N = ) (xb: YY NN YN NY = var1) ml search ml maximize **************************************** The problem with the kind of data that I have is that there is often a point mass, a, at zero. My CDF can then be written as, if x is my variable, G(x,a)=a if x=0 G(x,a)=a+(1-a)F(x) if x>0 The log-likelihood for this mixture model (as it is often referred to) can be written as LnL = YY*ln[(1-a)F(bid2Y)]+NN*ln[a+(1-a)F(bid2N)]+YN*ln[(1-a){F(bid2Y)-F(bid1) }]+NY*ln[(1-a){F(bid1)-F(bid2N)}] or by rearranging LnL = YY*ln[F(bid2Y)]+NN*ln[a+(1-a)F(bid2N)]+YN*ln[F(bid2Y)-F(bid1)]+NY*ln[F(b id1)-F(bid2N)]+(YY+YN+NY)*ln(1-a) It has been suggested that a=exp(b)/[1+exp(b)] where b is the parameter that we need to estimate. This logistic form ensures that a=[0,1]. Based on this assumption I tried to estiamte the following model, ** Specify log-likelihood function ** capture program drop double_cv_spike program double_cv_spike version 9.2 args lnf bid xb p qui replace `lnf' = ln(norm($ML_y2*`bid'+`xb')) if $ML_y4 == 1 qui replace `lnf' = ln(invlogit(`p')+(1-invlogit(`p'))*norm(-($ML_y3*`bid'+`xb'))) if $ML_y5 == 1 qui replace `lnf' = ln((norm(-($ML_y2*`bid'+`xb')) - /// norm(-($ML_y1*`bid'+`xb')))) if $ML_y6 == 1 qui replace `lnf' = ln((norm(-($ML_y1*`bid'+`xb')) - /// norm(-($ML_y3*`bid'+`xb')))) if $ML_y7 == 1 qui replace `lnf' = ln(1-invlogit(`p')) if ($ML_y4 == 1 | $ML_y6 == 1 | $ML_y7 == 1) end ** Estiamte model ** ml model lf double_cv_spike (bid: bid1 bid2Y bid2N = ) (xb: YY NN YN NY = var1) (p: one = ) ml search ml maximize ************************************************************** Hence, I have added (1-a) to my second `lnf' argument and added a fifth `lnf' argument with ln(1-a), which is choosen as long as NN is not one (i.e. when not $ML_y5 == 1). The third equation, p, consist of a variable with the constant 1 that is included to estimate the constant b. My maximization never converges. I get the information that numerical derivatives are approximate flat or discontinuous region encountered Iteration 1: log likelihood = 0 Does anyone know if my programming is wrong, or perhaps a better way specify the log-likelihood function? Bests, Henrik * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: re: how to suppress the reporting of some regressors?** - Next by Date:
**Re: st: Teach an old dog new tricks** - Previous by thread:
**st: re: how to suppress the reporting of some regressors?** - Next by thread:
**st: platform/OS/ver for max mem?** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |