# st: Re: st: Need Kullback–Leiber divergence measure

 From Tirthankar Chakravarty To statalist@hsphsun2.harvard.edu Subject st: Re: st: Need Kullback–Leiber divergence measure Date Sat, 8 May 2010 15:21:34 +0530

```<>

Although the user-written -multgof- (Jeroen Weesie, SSC) will do this
for you, it is pretty easy to do this by yourself, following excellent
http://www.stata-journal.com/sjpdf.html?articlenum=pr0024
I am assuming you want to find the divergence between the frequencies
of a two-way tabulation:
*********************************************
clear*
sysuse auto, clear
tabulate rep78 foreign, matcell(newmat)
mata
// Kullback-Leibler divergence
vP1 = st_matrix("newmat")[.,1]:/
sum(st_matrix("newmat")[.,1])
vP2 = st_matrix("newmat")[.,2] :/
sum(st_matrix("newmat")[.,2])
dKLdiv = sum(vP1:*log(vP1:/vP2))
// Kullback-Leibler symmetric divergence
dKLSdiv = 0.5*(dKLdiv+ sum(vP2:*log(vP2:/vP1)))
// Jensen-Shannon divergence
dJSdiv = sum(vP1:*log(vP1:/(0.5*(vP1+vP2)))) +
sum(vP2:*log(vP2:/(0.5*(vP1+vP2))))
dKLdiv, dKLSdiv, dJSdiv
end
*********************************************
See however, this discussion on the Matlab lists about handling zero
probabilities in discrete-valued distributions.
http://www.mathworks.com/matlabcentral/fileexchange/13089

Note that -multgof- will refuse to handle this case for you:
*********************************************
svmatf , mat(newmat) fil(newmat.dta)
use newmat, clear
multgof c1 c2, kl
*********************************************
using -svmatf- due Jan Brogger (SSC).

T

2010/5/8 Michael C. Morrison <Morrimic@niacc.edu>:
> I've searched Stata (with no success) for "KullbackLeiber divergence" also
> known as the information number, discrimination function, and “distance.”
>
> It's used to measure the divergence between two distributions.
>
> Any help would be appreciated.
>
> Mike
>

```