Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: RE: Mean test in a Likert Scale

 From Ulrich Kohler To statalist@hsphsun2.harvard.edu Subject Re: st: RE: Mean test in a Likert Scale Date Fri, 31 Aug 2012 20:55:56 +0200

```I'm sure that I don't know all the reasons why we should not take means
over values from an ordinal scale, but _one_ reason is that conclusions
taken from the mean are not robust against allowed transformations of
the values.

Generally, all transformations that preserve the numerical order of the
scale are allowed for an ordinal scale. Thus we can transform the
values

1,  2,  3,  4,  5

of an ordinal scale to

0, 11, 13, 19, 42

because this would not change at all the order of the scale.

Now, consider two students, A and B, with grades

A: 1, 1, 1, 1, 5
B: 2, 2, 2, 2, 1

where 1 means "excellent" and 5 means "very bad". If we calculate the
students' averages we obtain 1.8 for both of them. Our substantial
conclusion from the comparison of means would therefore be that both
students are equal.

Now, let us do the "allowed" transformation proposed above:

A:  1,  1,  1,  1, 42
B: 11, 11, 11, 11,  0

In this case we obtain an average of 9.2 for student A and 8.8 for
student B. Hence, this time our substantial conclusion would be that "B
is better than A".

Clearly we should not use a statistic that tells us different truths for
arbitrary "allowed" transformations of scales. For the ordinal scale the
median is better suited because conclusions from the median do not
change for allowed transformations of values of an ordinal scale (in the
example, student A would be better than B in either case).

So far for the theory. However, in practice we might concede that we
have measured something on an ordinal scale, but we sort of _aggree_
that we will never do any transformations of the values. In this case
the theoretical discussion above does not really matter. Theoretically,
we might get different results with some transformation but as we never
transform in practice, it will just not happen. Basically we would get
some sort of a "conventional" absolute scale, then. It would make quite
some sense to to use means for such scales.

The question then is whether the 5-point-scale used in the original
question can be seen as kind of a conventional absolute scale. I tend to
say that this is the case, but I would like to leave that up to the
questioner himself.

Uli

Am Freitag, den 31.08.2012, 18:35 +0100 schrieb Nick Cox:
> I bow to others' expertise and experience on the minutiae here, some
> of which seem almost theological in character. For "likert" read
> "Likert", passim.
>
> My impression from the thread, however, is that some seem to think
> that extreme views vs moderate views are the issue, and some seem to
> think that agree vs disagree is the issue. I can't detect a consistent
> position among posters about how intermediate points are to be handled
> either.
>
> This disagreement to me adds flavour to the wording "arbitrary".
>
> Naturally, I am interested to learn that dichotomising a Likert scale
> is something that researchers think is sometimes justifiable, but I
> ever met it I would expect some discussion of quite how it was done
> and why that made sense.
>
> Nick
>
> On Fri, Aug 31, 2012 at 5:32 PM, David Radwin <dradwin@mprinc.com> wrote:
> > Rob,
> >
> > It may be the case that not labeling the middle points of a scale, as in
> > your first example, justifies the assumption of equal spacing (deltas).
> > But the literature suggests that verbally labeling all points on a scale,
> > as in your second example, leads to more reliable measurement. See, for
> > example:
> >
> > Alwin DF, Krosnick JA. 1991. The reliability of survey attitude
> > measurement: The influence of question and respondent attributes. Sociol.
> > Methods Res. 20:139-81.
> > http://deepblue.lib.umich.edu/bitstream/2027.42/68969/2/10.1177_0049124191
> > 020001005.pdf
> >
> Rob Ploutz-Snyder
>
> >> My 2 cents...when designing these sorts of instruments...
> >>
> >> I was trained that a true likert scale doesn't label each of the
> >> points in the 5-point (or other) scale, but instead has only TWO
> >> labels at each extreme.  For example:
> >>
> >> I like Statalist..............      Completely Disagree   1  2  3  4
> >> 5    Completely Agree
> >>
> >> This is in CONTRAST to a scale that would label each and every point
> >> (sometimes called "likert-type" or "modified-likert") for example:
> >>
> >> 1=completely disagree
> >> 2=disagree
> >> 3=neutral
> >> 4=agree
> >> 5=completely agree
> >>
> >> With true likert scales, while still not continuous in scale, the
> >> distance between each category in a true likert scale is not
> >> subjective.  The delta between "1" and "2" is the same as the delta
> >> between "2" and "3" etc.  and it is assumed that survey respondents
> >> can appreciate this.  The same cannot be assumed about the difference
> >> between "completely disagree" and "disagree" being equal to the delta
> >> between "disagree" and "neutral."
> >>
> >> So in that way, a  true-likert scale removes some of the subjectivity
> >> on the deltas and seems to achieve a more proper ordinal scale as
> >> opposed to purely categorical.
> >>
> >> Still doesn't justify using parametric statistical techniques...
> >> However, most well-vetted Sociology or Psychological instruments are
> >> designed to use multiple questions that, together, are used to measure
> >> a particular construct.  Social scientists don't usually intend to
> >> compare responses on single questions, but instead ask many questions
> >> that cluster together, often verified by exploratory or confirmatory
> >> factor analysis, where "factor scores" are then created to capture the
> >> overall construct of interest.  These factor scores can be derived by
> >> different methods, the simplest being a mean of the items that cluster
> >> together, but usually by more sophisticated regression-based methods
> >> that weigh each item according to how well it correlates with the
> >> overall factor structure.  These factor scores are continuously
> >> scaled, unlike the individual items that were used to derive them, and
> >> it is these factor scores that are often analyzed by various
> >> parametric statistical techniques.
> >>
> >> Whether or not the factor scores  are normally distributed in the
> >> population (the real question) is dependent on the particulars of each
> >> research study, but I don't categorically deny that the assumption is
> >> invalid.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```