I think you are right; but your colleagues are too.
There is a well-worn argument that
such a scale is just ordinal and so, as
you say, the median is then the standard
summary measure. Samuel S. Stevens was just one
of many people to say this again and again,
even beyond the date of his death. However,
the consequence is then that other
information is thrown away which is
relevant, namely every other detail of the
frequency distribution. Also, if the real
need is some sort of ranking, the median
is far too coarse.
As the numbers are just 1 ... 5, problems
with outliers are unlikely to be severe.
For medical education it may even be a
virtue. I wouldn't want my medical practitioner
to be someone who didn't perform well
across the board.
In fact, from something like 25 years of
major responsibility for summarising student
grades I would say that working with
means is on the whole a better way
of summarising such data than medians.
But also that no totally mechanistic
summary works well without also looking
at the tails and more generally the
shape of the distribution.
There is also a case for trimming extreme
grades in any situation that might include
personal prejudice.
Otherwise put, the best reduction is
not a single measure but the entire
(empirical (cumulative)) distribution
function. You do not give the number
of students but graphs of all the
cumulative curves would reveal gross anomalies
if it was not too large.
Incidentally, I do not know why simple
5-point scales are so often called Likert
scales. I do know that one Rensis Likert
(fl. 1940) often worked with ordinal scales but
I have read that what he did was much more
specific. Be that as it may, he
was not the first to use 5-point ordered
scales and the idea hardly needs to
be tagged with anyone's name. So,
the name is usually applied is historically
inaccurate and unnecessary in any case.
Nick
n.j.cox@durham.ac.uk
Christopher W. Ryan
> To evaluate students on clinical rotations, our medical school (like
> many, I suppose) uses a form on which the evaluator assigns an integer
> score for each of 10 competencies. The scale runs from 1 to 5; 1 is
> worst, 5 is best. Generally, each student is evaluated by 10-11
> different people, yielding 10-11 observations on each competency for
> each student.
>
> At least one of the clerkship directors is in the habit of
> calculating a
> student's mean score on each competency. To two decimal places. The
> competency means are then averaged again, using the mean, to produce a
> single summative number for each student. To 3 decimal places. This
> number then counts a certain percentage toward the final
> clerkship grade
> (along with written exams and other exercises.)
>
> I may be wrong, but I have some reservations about
> calculating means of
> scores on a 5-point Likert scale. To me it seems like an
> ordinal scale.
> I keep making the claim that the proper measure of central
> tendency in
> this case would be the median. None of my colleagues agree; I get the
> feeling they don't see any problem with putting arbitrary consecutive
> integer values on the different levels of performance. My point is
> that, while it is neat and tidy and perhaps intuitive, we have no
> evidence that the labeled levels on the performance scale are equally
> "spaced."
>
> My concern is with the end result: in the final accounting, students
> are sorted by final score and end up being separated by fractions of a
> point. I would consider these students to be of essentially equal
> abilities, for all practical purposes. Yet sometimes the
> cut-points for
> our categories of final grade (Honors, High Pass, Pass, Conditional,
> Fail) end up falling between two very closely spaced students.
>
> What are your collective thoughts on this?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/