<vaisey@unc.edu> asked,
> I am trying to implement the following formula for estimating the
> (standardized) informational entropy of a RxC contingency table:
> sum(p*ln(p))+ln(M) where p (subscript ij omitted) is the proportion of
> cases in each cell and M is the total number of cells. So far, I've
> only managed to accomplish the following:
>
> . tab var1 var2, matcell(Cell)
> . matrix Proportions = Cell/r(N)
There have already been lots of answers to this question. I particularly
like the answer by Nick Cox <n.j.cox@durham.ac.uk> because he used Mata,
but Nick packaged his solution into an ado-file with a Mata subroutine,
which disguised how easy the solution is.
The entire solution to the problem is
. tab va1 var2, matcell(Cell)
. mata:
: M = st_numscalar("r(N)")
: X = st_matrix("Cell")
: P = X/M
: sum(P:*ln(P))+ln(M)
: end
You can just type the above interactively into Stata. It will work.
In Steve's original question, he wrote
> For some (probably good) reason, while you can easily multiply matrix
> elements by a scalar, you can't do something like:
>
> . matrix LnP = ln(Proportions)
That's true, you can't do that in Stata's *OLD* matrix language, but you
can do that in Stata's *NEW* matrix language, Mata.
In the code above, I just did that in-line,
: sum(P:*ln(P))+ln(M)
but I just as well could have typed:
: LnP = ln(P)
: sum(P:*LnP)+ln(M)
In fact, that is exactly what I did when I tried this problem, and I typed
even more so that I could see intermediate results and make sure things were
going well. Here's the actual log:
==============================================================================
. tab var1 var2, matcell(Cell)
Repair |
Record | var2
1978 | 1 2 3 | Total
-----------+---------------------------------+----------
1 | 5 4 1 | 10
2 | 8 15 7 | 30
3 | 2 10 6 | 18
4 | 3 6 2 | 11
-----------+---------------------------------+----------
Total | 18 35 16 | 69
. mata:
------------------------------------------------- mata (type end to exit) ----------
: M = st_numscalar("r(N)")
: M
69
: X = st_matrix("Cell")
: X
1 2 3
+----------------+
1 | 5 4 1 |
2 | 8 15 7 |
3 | 2 10 6 |
4 | 3 6 2 |
+----------------+
: P = X/M
: P
1 2 3
+-------------------------------------------+
1 | .0724637681 .0579710145 .0144927536 |
2 | .115942029 .2173913043 .1014492754 |
3 | .0289855072 .1449275362 .0869565217 |
4 | .0434782609 .0869565217 .0289855072 |
+-------------------------------------------+
: lnP = ln(P)
: lnP
1 2 3
+----------------------------------------------+
1 | -2.624668592 -2.847812143 -4.234106505 |
2 | -2.154664963 -1.526056303 -2.288196356 |
3 | -3.540959324 -1.931521412 -2.442347035 |
4 | -3.135494216 -2.442347035 -3.540959324 |
+----------------------------------------------+
: sum(P:*lnP)+ln(M)
1.957469762
: end
==============================================================================
By the way, the strange ":*" in -sum(P:*lnP)+ln(M)- is Mata's elementwise
multiplication operator. P*lnP would be matrix multiplication.
-- Bill
would@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/