# RE: st: Penalized MLE

 From "Verkuilen, Jay" <[email protected]> To <[email protected]> Subject RE: st: Penalized MLE Date Sat, 26 Jan 2008 14:33:58 -0500

```--- SR Millis <[email protected]> wrote:
> Is it possible to do penalized maximum likelihood
> estimation in Stata? A search of the Archives failed
> to turn up anything on this topic.

--- Maarten Buis wrote:

>-findit penalized- mentions gam.

Yeah, GAM would use a penalized likelihood function because the penalty
would be there to make the spline functions sufficiently smooth.
Penalized estimation is, therefore, commonly employed to avoid certain

VERY roughly, the basic idea can be thought of in a quasi-Bayesian
fashion by employing informative priors to avoid regions of the
parameter space that are viewed as a priori impossible. Here goes

Posterior = Likelihood X Prior / (Integration Constant).

Then take the log:

log(Posterior) = log(Likelihood) + log(Prior) - log(Integration
Constant)

If you think of the Prior as a penalty term to tell the estimation about
parts of the parameter space that it should avoid, it's what penalized
likelihood is doing. Since maximizing the likelihood is, in a sense,
finding the posterior mode, the Integration Constant doesn't matter, and
we blow it off, thus want to find

PenMLE = arg max {log(Likelihood) + log(Prior)}

I haven't used the ML function (need to buy that book) but if it lets
you put in the log-likelihood, the penalty terms can simply be tacked on
to the end.

The trick is, of course, finding the right penalties, which is why the
Bayesian approach is so useful. For instance (quoting Andrew Gelman from
a talk I was at the other day), if you have properly standardized
independent variables, logistic regression coefficients should never be
larger than 5 in magnitude. Thus a reasonable prior to impose would be
one that is essentially flat in (-5,5) and rapidly decreasing outside
that. A Cauchy distribution does this pretty easily. This is essentially
a penalized likelihood approach that avoids the commonly found problem
of separation in logistic regression.

The informative priors approach is quite commonly employed in
psychometric applications. For instance, the commonly used
three-parameter logistic (3PL) item response model is very difficult to
estimate without going Bayesian and using informative priors on the
pseudo-guessing parameter and the slope parameter. Since generating good
informative priors for these paramters is not especially difficult, it's
a way to handle the problem they cause.

Jay
--
J. Verkuilen
Assistant Professor of Educational Psychology
City University of New York-Graduate Center
365 Fifth Ave.
New York, NY 10016
Email: [email protected]
Office: (212) 817-8286
FAX: (212) 817-1516
Cell: (217) 390-4609

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```