Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: manipulating matrix elements


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: manipulating matrix elements
Date   Sun, 2 Apr 2006 17:24:49 +0100

That's not quite my point. 

In one case, the standardization is additive 
and in the other it is multiplicative. 
Either should leave the two measures 
perfectly correlated, as a correlation is invariant 
under a linear transformation, such as conversion
from Fahrenheit to Celsius or vice versa. I trust the 
failure to observe a correlation of 1
is down to numerical fuzz. 

This still leaves the question of whether
both standardizations are correct in different 
senses, or whether they are the same, but 
just appear superficially different. 
 
Otherwise put, suppose you are always 
right on something, but my numbers are always 
twice as big as yours. Our results
are perfectly correlated but that doesn't 
affect the fact that I am biased and
therefore wrong. 

Nick 
n.j.cox@durham.ac.uk 

Steve Vaisey
 
> I compared the standardizations (yours and Martin's) on a 
> matrix of 11 
> variables (for a total of 55 non-redundant, 2 variable comparisons).  
> They are highly correlated (~.99) so it's probably a matter 
> of taste at 
> one level.  Or am I missing something?
> 
> Thanks again for your help on this.
> 
> Steve
> 
> >
> >Date: Thu, 30 Mar 2006 18:34:16 +0100
> >From: "Nick Cox" <n.j.cox@durham.ac.uk>
> >Subject: RE: st: manipulating matrix elements
> >
> >Thanks for the references. I found the second reference 
> >once I had worked out that the volume number is 107. 
> >
> >Martin does, as you say, use the quantity 
> >
> >	SUM p ln p + ln #cells 
> >
> >In fields I know a bit about, it is more common
> >to use 
> >
> >	- SUM p ln p = H 
> >
> >as a basic quantity. This is what is used in 
> >my program -ineq- on SSC, for example. 
> >
> >Also, if this H is based on K categories, it can vary 
> >between 0 and ln K, so a simple scaling is H / ln K. 
> >(In the limiting case of a single category with p = 1, 
> >you have to trap the 0 / 0 calculation.) There is 
> >no assumption or approximation in this. 
> >
> >I am not clear that this is what you doing, but no 
> >matter. 
> >
> >Looking at my little program, it is easy to generalise 
> >it so that it can take one variable or two. This is me
> >modifying the program so it does things I sometimes 
> >want to do, no more. 
> >
> >*! 1.0.0 NJC 30 March 2006
> >program myentropy, rclass
> >	version 9  
> >	syntax varlist(min=1 max=2) [if] [in] [fweight aweight] 
> >	
> >	marksample touse 
> >	qui count if `touse' 
> >	if r(N) == 0 error 2000 
> >
> >	tempname matname
> >	tab `varlist' [`weight' `exp'] if `touse', matcell(`matname')
> >	mat `matname' = `matname' / r(N)
> >	mata: subroutine("`matname'")
> >	di 
> >	di as txt "entropy      " as res %7.4f r(entropy) 
> >	di as txt "scaled [0,1] " as res %7.4f r(scaled) 
> >	return scalar entropy = r(entropy) 
> >	return scalar scaled = r(scaled) 
> >end 	
> >
> >mata:
> >void subroutine(string scalar matname)
> >{
> >	real matrix 	X
> >	real scalar	H       
> >	X = st_matrix(matname)
> >	H = -sum(X :* ln(X)) 
> >	scaled = H == 0 ? 0 : H / ln(rows(X) * cols(X)) 
> >	st_numscalar("r(entropy)", H)
> >	st_numscalar("r(scaled)", scaled)
> >}
> >end
> >
> >Nick 
> >n.j.cox@durham.ac.uk 
> >
> >Steve Vaisey
> > 
> >  
> >
> ====================================================
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index