That's not quite my point.
In one case, the standardization is additive
and in the other it is multiplicative.
Either should leave the two measures
perfectly correlated, as a correlation is invariant
under a linear transformation, such as conversion
from Fahrenheit to Celsius or vice versa. I trust the
failure to observe a correlation of 1
is down to numerical fuzz.
This still leaves the question of whether
both standardizations are correct in different
senses, or whether they are the same, but
just appear superficially different.
Otherwise put, suppose you are always
right on something, but my numbers are always
twice as big as yours. Our results
are perfectly correlated but that doesn't
affect the fact that I am biased and
therefore wrong.
Nick
n.j.cox@durham.ac.uk
Steve Vaisey
> I compared the standardizations (yours and Martin's) on a
> matrix of 11
> variables (for a total of 55 non-redundant, 2 variable comparisons).
> They are highly correlated (~.99) so it's probably a matter
> of taste at
> one level. Or am I missing something?
>
> Thanks again for your help on this.
>
> Steve
>
> >
> >Date: Thu, 30 Mar 2006 18:34:16 +0100
> >From: "Nick Cox" <n.j.cox@durham.ac.uk>
> >Subject: RE: st: manipulating matrix elements
> >
> >Thanks for the references. I found the second reference
> >once I had worked out that the volume number is 107.
> >
> >Martin does, as you say, use the quantity
> >
> > SUM p ln p + ln #cells
> >
> >In fields I know a bit about, it is more common
> >to use
> >
> > - SUM p ln p = H
> >
> >as a basic quantity. This is what is used in
> >my program -ineq- on SSC, for example.
> >
> >Also, if this H is based on K categories, it can vary
> >between 0 and ln K, so a simple scaling is H / ln K.
> >(In the limiting case of a single category with p = 1,
> >you have to trap the 0 / 0 calculation.) There is
> >no assumption or approximation in this.
> >
> >I am not clear that this is what you doing, but no
> >matter.
> >
> >Looking at my little program, it is easy to generalise
> >it so that it can take one variable or two. This is me
> >modifying the program so it does things I sometimes
> >want to do, no more.
> >
> >*! 1.0.0 NJC 30 March 2006
> >program myentropy, rclass
> > version 9
> > syntax varlist(min=1 max=2) [if] [in] [fweight aweight]
> >
> > marksample touse
> > qui count if `touse'
> > if r(N) == 0 error 2000
> >
> > tempname matname
> > tab `varlist' [`weight' `exp'] if `touse', matcell(`matname')
> > mat `matname' = `matname' / r(N)
> > mata: subroutine("`matname'")
> > di
> > di as txt "entropy " as res %7.4f r(entropy)
> > di as txt "scaled [0,1] " as res %7.4f r(scaled)
> > return scalar entropy = r(entropy)
> > return scalar scaled = r(scaled)
> >end
> >
> >mata:
> >void subroutine(string scalar matname)
> >{
> > real matrix X
> > real scalar H
> > X = st_matrix(matname)
> > H = -sum(X :* ln(X))
> > scaled = H == 0 ? 0 : H / ln(rows(X) * cols(X))
> > st_numscalar("r(entropy)", H)
> > st_numscalar("r(scaled)", scaled)
> >}
> >end
> >
> >Nick
> >n.j.cox@durham.ac.uk
> >
> >Steve Vaisey
> >
> >
> >
> ====================================================
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/