Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: -cluster kmeans- or -cluster kmedian-


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: -cluster kmeans- or -cluster kmedian-
Date   Sat, 24 Mar 2012 17:22:45 +0000

How far measurement scale implies what statistical methods are or are
not "valid" (or even useful) is a messy mix of impure logic,
pragmatism and pure prejudice.

On your first question, explaining why any method is popular is often
difficult, except through a near circular argument that it may be what
is most frequently included in teaching and used in research papers.

I suppose most people have met the argument that ordinal scale
measurement (NB "Likert", not likert) doesn't justify taking means,
but how many of them decline as a matter of principle to participate
in taking or using grade-point averages? I've often heard the argument
that e.g. Spearman correlation, not Pearson correlation, is
appropriate for ordinal data, but it rests on calculations of the
means and variances of ranks.

The idea that a binary variable can't be averaged because it is
categorical is also fallacious. Consider a class of 56 women and 44
men and score a gender variable as 1 for female and 0 for male. Then
the mean of 0.56 has an obvious interpretation as the proportion of
females.  I wonder how many statistically-minded people really
consider that the only valid summaries are the median and mode of 1
because the variable is categorical.

To answer your second question more directly, people often say similar
things, but nevertheless it might still be helpful. Statistics, like
any branch of applied mathematics, often entails ignoring purist
assumptions when they aren't important. No variable that we treat as
approximately normal could really be any finite value with some
positive probability.  It's not that, but other behaviour, that limits
the applicability of the normal.

Turn and turn about, if the data are just a few binary variables, then
they are already classified. What you need most is a way of displaying
the frequencies of the 2^k joint categories.

Nick

On Sat, Mar 24, 2012 at 2:04 PM, David Cefskimal
<david.cefskimal@googlemail.com> wrote:

> Stata offers cluster kmeans and cluster kmedian command.
>
> I am wondering why some scholars used cluster kmeans and not cluster
> kmedians to form “attitude clusters” of ordinal scalled (likert type)
> variables?
>
> Second, to me it does not make much sense to have binary coded
> variables included in such a cluster analysis at least if the cluster
> analysis is based on Euclidian distance measure, or not?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index