[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: "logistic scores"

From	Marcello Pagano <[email protected]>
To	[email protected]
Subject	st: Re: "logistic scores"
Date	Sun, 18 Mar 2007 17:02:17 -0400

I am resending this because it was a mess.
m.p.

Nick Cox wrote:

-----Original Message-----
From: Nick Cox Sent: 18 March 2007 18:25
To: '[email protected]'
Subject: "logistic scores"

My questions come at the end.
It's a habit of mine to revisit my favourite books. Looking again at
Mosteller, F. and Tukey, J.W. 1977. Data analysis and regression. Reading, MA: Addison-Wesley. Chs 5F, 5H, 11F, 11G.
I found a very Tukeyish way of mapping the frequencies
of a set of ordered categories (grades) to numerical scores. Each category is treated as a slice from a standard logistic distribution and what is returned is a centre of gravity for that slice. The recipe is first to calculate cumulative probabilities p for less
than each grade and cumulative probabilities P for
less or equal to each grade and then, defining
phi(p) = p ln p + (1 - p) ln (1 - p),
to calculate scores that are

(phi(P) - phi(p)) / (P - p).
(I've not re-created the derivation for myself.)
I call these "logistic scores".
The logistic is justified by Mosteller and Tukey
as convenient to work with, and as giving similar results to Gaussian and Cauchy alternatives any way. Computational ease is naturally less compelling in 2007 than it was in 1977, but simple and useful
still wins every time in the absence of better
alternatives.
This kind of thing goes nicely in Mata and here
is a function to do it:
// NJC 16 March 2007
// cf. Mosteller, F. and Tukey, J.W. 1977. Data analysis and regression. // Reading, MA: Addison-Wesley. Chs 5F, 5H, 11F, 11G. real logistic_scores(real colvector freq)
{ real colvector P, p, zero, z real scalar k
k = rows(freq) P = freq
for(i = 2; i <= k; i++) { P[i] = P[i - 1] + P[i] }

P = P / P[k] zero = J(k, 1, 0) z = rowmin((zero, P :* ln(P) + (1 :- P) :* ln(1 :- P)))
p = 0 \ P[1..k-1] z = z - rowmin((zero, p :* ln(p) + (1 :- p) :* ln(1 :- p)))
z = z :/ (P - p)
return(z) }

end

A detail that requires care is handling terms like p ln p when p is zero
and its logarithm would thus be indeterminate. It is natural
mathematically to regard the overall product as zero, but you have
to spell that out to Mata. The ? : construct seems less useful here
than comparing directly with a vector of zeros.
Any way, using the example in Mosteller and Tukey (1977, p.106)
of grades A .. E, we type in a vector of frequencies and
get scores:
: freq = (127\497\3243\231\74)

: logistic_scores(freq)
1
+----------------+
1 | -4.476586375 |
2 | -2.39817005 |
3 | .206295676 |
4 | 3.115523631 |
5 | 5.023164169 |
+----------------+

My questions:
1. My impression is that there is a tenuous connection here with what ordered logit does, but I don't think the latter is quite equivalent, even indirectly, because
it works with cutpoints between grades, not the grades
themselves. Someone well into that and similar models may care to comment.
By the way, I am pretty clear (perhaps wrongly) that I
am not asking about correspondence analysis here, which
I think requires a two-way table to do its magic. I am only interested for the moment in recipes for single variables.
2. I have a hard time finding examples of this
device of Mosteller and Tukey ever being used, apart from a couple of instances in educational statistics. They may exist, but I am looking in the wrong places. If anyone, especially on the biostatistical side, recognises this as a standard tool, or can say what people do instead, please signal.
Nick [email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: dose-response
Next by Date: st: Weights in survey design
Previous by thread: st: "logistic scores"
Next by thread: st: Weights in survey design
Index(es):
- Date
- Thread