Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: statistical question: best summary measure for a 5-point Likert scale

From   Fred Wolfe <>
Subject   Re: st: RE: statistical question: best summary measure for a 5-point Likert scale
Date   Wed, 14 Jun 2006 16:46:23 -0500

I haven't followed all of this discussion, but you might want to take a look at and their program, winsteps and facets. Using the data that you have, you might get insight into the number of group you can divide students into and still separate them. In addition, you might be able to understand if the ratings scales are evaluating the same or different concept. This might give you insight into whether the scale can be summed.

Just because you measure your 100 point ruler to 2 decimal places doesn't mean you can categorize students in a 100 or 10,000 groups. You can't. Although there are rules of thumb, using the Rasch Winstep program can tell you how many groups you can find reliably among your students.

Before you conclude that you can take means or medians, you might look at the Rasch analysis literature.

This doesn't even take into account differences in ratings by different observers. GLLAMM is another approach that will give you insight.


At 04:23 PM 6/14/2006, you wrote:

> Also, if the real
>> need is some sort of ranking, the median
>> is far too coarse.

That's the crux of the matter.  And it's where the statistics leaves off
and the pedagogy and philosophy kicks in.  Is there a real need for
ranking?  Using the mean leads to a very fine-grained sorting of
students into a lot of bins, each containing only one or two students.
Two students end up with a 3.76, and the next-lower student ends up with
a 3.73.  I doubt there is any substantial difference in doctoring
abilities between those three students.  But they would have different
class rankings, and they might very well be given different final letter

I'm trying to make the case that the mean is far too fine.  If our
fundamental tool for observing and evaluating allows faculty to classify
students into 5 categories: lousy, fair, decent, very good, and great,
then shouldn't the final categorization, summarizing all the
observations, also categorize them into roughly the same number of bins?
 That would yield few categories, with lot's of students in each.  That
makes sense to me.  Instead, what we get are a lot of categories, each
containing only 1-2 students.

If you give a hundred people a ruler with the inch as the smallest
marked unit, and ask them all to measure a certain line, is it sensible
to then calculate the mean of all their measures and conclude that it is
8.62 inches?

All very difficult issues.  Someone once likened grading medical
students to "trying to distinguish shades of ultraviolet."  Appreciate
your insights.

Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
and Wilson Family Practice Residency, Johnson City, NY
GnuPG and PGP public keys available at

"If you want to build a ship, don't drum up the men to gather wood,
divide the work and give orders. Instead, teach them to yearn for the
vast and endless sea."  [Antoine de St. Exupery]

Nick Cox wrote:
> I think you are right; but your colleagues are too.
> There is a well-worn argument that
> such a scale is just ordinal and so, as
> you say, the median is then the standard
> summary measure. Samuel S. Stevens was just one
> of many people to say this again and again,
> even beyond the date of his death. However,
> the consequence is then that other
> information is thrown away which is
> relevant, namely every other detail of the
> frequency distribution. Also, if the real
> need is some sort of ranking, the median
> is far too coarse.
> As the numbers are just 1 ... 5, problems
> with outliers are unlikely to be severe.
> For medical education it may even be a
> virtue. I wouldn't want my medical practitioner
> to be someone who didn't perform well
> across the board.
> In fact, from something like 25 years of
> major responsibility for summarising student
> grades I would say that working with
> means is on the whole a better way
> of summarising such data than medians.
> But also that no totally mechanistic
> summary works well without also looking
> at the tails and more generally the
> shape of the distribution.
> There is also a case for trimming extreme
> grades in any situation that might include
> personal prejudice.
> Otherwise put, the best reduction is
> not a single measure but the entire
> (empirical (cumulative)) distribution
> function. You do not give the number
> of students but graphs of all the
> cumulative curves would reveal gross anomalies
> if it was not too large.
> Incidentally, I do not know why simple
> 5-point scales are so often called Likert
> scales. I do know that one Rensis Likert
> (fl. 1940) often worked with ordinal scales but
> I have read that what he did was much more
> specific. Be that as it may, he
> was not the first to use 5-point ordered
> scales and the idea hardly needs to
> be tagged with anyone's name. So,
> the name is usually applied is historically
> inaccurate and unnecessary in any case.
> Nick
> Christopher W. Ryan
>> To evaluate students on clinical rotations, our medical school (like
>> many, I suppose) uses a form on which the evaluator assigns an integer
>> score for each of 10 competencies.  The scale runs from 1 to 5; 1 is
>> worst, 5 is best.  Generally, each student is evaluated by 10-11
>> different people, yielding 10-11 observations on each competency for
>> each student.
>> At least one of the clerkship directors is in the habit of
>> calculating a
>> student's mean score on each competency.  To two decimal places.  The
>> competency means are then averaged again, using the mean, to produce a
>> single summative number for each student.  To 3 decimal places.  This
>> number then counts a certain percentage toward the final
>> clerkship grade
>> (along with written exams and other exercises.)
>> I may be wrong, but I have some reservations about
>> calculating means of
>> scores on a 5-point Likert scale.  To me it seems like an
>> ordinal scale.
>>  I keep making the claim that the proper measure of central
>> tendency in
>> this case would be the median.  None of my colleagues agree; I get the
>> feeling they don't see any problem with putting arbitrary consecutive
>> integer values on the different levels of performance.  My point is
>> that, while it is neat and tidy and perhaps intuitive, we have no
>> evidence that the labeled levels on the performance scale are equally
>> "spaced."
>> My concern is with the end result:  in the final accounting, students
>> are sorted by final score and end up being separated by fractions of a
>> point.  I would consider these students to be of essentially equal
>> abilities, for all practical purposes.  Yet sometimes the
>> cut-points for
>> our categories of final grade (Honors, High Pass, Pass, Conditional,
>> Fail) end up falling between two very closely spaced students.
>> What are your collective thoughts on this?
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

Fred Wolfe
National Data Bank for Rheumatic Diseases
Wichita, Kansas
Tel (316) 263-2125     Fax (316) 263-0761

*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index