# st: ml interval data with point mass at zero

 From "Henrik Andersson" To Subject st: ml interval data with point mass at zero Date Thu, 8 May 2008 17:44:23 +0200

```Hi,

I have estimated a survival function for interval data. The
log-likelihood function can be written as

LnL =
YY*ln[F(bid2Y)]+NN*ln[F(bid2N)]+YN*ln[F(bid2Y)-F(bid1)]+NY*ln[F(bid1)-F(
bid2N)]

where YY, NN, YN, and NY are indicator variables coded as one if my
variable of interest falls within that interval, and zero otherwise.
F(.) are my CDFs. My model works fine and is defined as follows.

** Specify log-likelihood function **

capture program drop double_cv
program double_cv
version 9.2
args lnf bid xb
qui replace `lnf' = ln(norm(\$ML_y2*`bid'+`xb')) if \$ML_y4 == 1
qui replace `lnf' = ln(norm(-(\$ML_y3*`bid'+`xb'))) if \$ML_y5 == 1
qui replace `lnf' = ln(norm(-(\$ML_y2*`bid'+`xb')) - ///
norm(-(\$ML_y1*`bid'+`xb'))) if \$ML_y6 == 1
qui replace `lnf' = ln(norm(-(\$ML_y1*`bid'+`xb')) - ///
norm(-(\$ML_y3*`bid'+`xb'))) if \$ML_y7 == 1 end

** Estimate model **

ml model lf double_cv_norm_rev_spike1 (bid:  bid1 bid2Y bid2N = ) (xb:
YY NN YN NY = var1)
ml search
ml maximize

****************************************

The problem with the kind of data that I have is that there is often a
point mass, a, at zero. My CDF can then be written as, if x is my
variable,

G(x,a)=a if x=0
G(x,a)=a+(1-a)F(x) if x>0

The log-likelihood for this mixture model (as it is often referred to)
can be written as

LnL =
YY*ln[(1-a)F(bid2Y)]+NN*ln[a+(1-a)F(bid2N)]+YN*ln[(1-a){F(bid2Y)-F(bid1)
}]+NY*ln[(1-a){F(bid1)-F(bid2N)}]

or by rearranging

LnL =
YY*ln[F(bid2Y)]+NN*ln[a+(1-a)F(bid2N)]+YN*ln[F(bid2Y)-F(bid1)]+NY*ln[F(b
id1)-F(bid2N)]+(YY+YN+NY)*ln(1-a)

It has been suggested that a=exp(b)/[1+exp(b)] where b is the parameter
that we need to estimate. This logistic form ensures that a=[0,1]. Based
on this assumption I tried to estiamte the following model,

** Specify log-likelihood function **

capture program drop double_cv_spike
program double_cv_spike
version 9.2
args lnf bid xb p
qui replace `lnf' = ln(norm(\$ML_y2*`bid'+`xb')) if \$ML_y4 == 1
qui replace `lnf' =
ln(invlogit(`p')+(1-invlogit(`p'))*norm(-(\$ML_y3*`bid'+`xb'))) if \$ML_y5
== 1
qui replace `lnf' = ln((norm(-(\$ML_y2*`bid'+`xb')) - ///
norm(-(\$ML_y1*`bid'+`xb')))) if \$ML_y6 == 1
qui replace `lnf' = ln((norm(-(\$ML_y1*`bid'+`xb')) - ///
norm(-(\$ML_y3*`bid'+`xb')))) if \$ML_y7 == 1
qui replace `lnf' = ln(1-invlogit(`p')) if (\$ML_y4 == 1 | \$ML_y6 == 1 |
\$ML_y7 == 1)
end

** Estiamte model **

ml model lf double_cv_spike (bid:  bid1 bid2Y bid2N = ) (xb: YY NN YN NY
= var1) (p: one = )
ml search
ml maximize

**************************************************************

Hence, I have added (1-a) to my second `lnf' argument and added a fifth
`lnf' argument with ln(1-a), which is choosen as long as NN is not one
(i.e. when not \$ML_y5 == 1). The third equation, p, consist of a
variable with the constant 1 that is included to estimate the constant
b. My maximization never converges. I get the information that

numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 1:   log likelihood =          0

Does anyone know if my programming is wrong, or perhaps a better way
specify the log-likelihood function?

Bests,

Henrik

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```