Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: st: Need Kullback–Leiber divergence measure


From   Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: st: Need Kullback–Leiber divergence measure
Date   Sat, 8 May 2010 15:21:34 +0530

<>

Although the user-written -multgof- (Jeroen Weesie, SSC) will do this
for you, it is pretty easy to do this by yourself, following excellent
advice from William Gould here:
http://www.stata-journal.com/sjpdf.html?articlenum=pr0024
I am assuming you want to find the divergence between the frequencies
of a two-way tabulation:
*********************************************
clear*
sysuse auto, clear
tabulate rep78 foreign, matcell(newmat)
mata
// Kullback-Leibler divergence
vP1 = st_matrix("newmat")[.,1]:/
		sum(st_matrix("newmat")[.,1])
vP2 = st_matrix("newmat")[.,2] :/
		sum(st_matrix("newmat")[.,2])
dKLdiv = sum(vP1:*log(vP1:/vP2))
// Kullback-Leibler symmetric divergence
dKLSdiv = 0.5*(dKLdiv+ sum(vP2:*log(vP2:/vP1)))
// Jensen-Shannon divergence
dJSdiv = sum(vP1:*log(vP1:/(0.5*(vP1+vP2)))) +
		sum(vP2:*log(vP2:/(0.5*(vP1+vP2))))
dKLdiv, dKLSdiv, dJSdiv
end
*********************************************
See however, this discussion on the Matlab lists about handling zero
probabilities in discrete-valued distributions.
http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/d921b346db0ef427/296b3ab1a09e62a3
http://www.mathworks.com/matlabcentral/fileexchange/13089

Note that -multgof- will refuse to handle this case for you:
*********************************************
svmatf , mat(newmat) fil(newmat.dta)
use newmat, clear
multgof c1 c2, kl
 *********************************************
using -svmatf- due Jan Brogger (SSC).

T

2010/5/8 Michael C. Morrison <Morrimic@niacc.edu>:
> I've searched Stata (with no success) for "KullbackLeiber divergence" also
> known as the information number, discrimination function, and “distance.”
>
> It's used to measure the divergence between two distributions.
>
> Any help would be appreciated.
>
> Mike
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
To every ω-consistent recursive class κ of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(κ) (where v is the free variable of r).

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index