Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: -cluster kmeans- or -cluster kmedian-

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: -cluster kmeans- or -cluster kmedian-
Date	Sat, 24 Mar 2012 17:22:45 +0000

How far measurement scale implies what statistical methods are or are
not "valid" (or even useful) is a messy mix of impure logic,
pragmatism and pure prejudice.

On your first question, explaining why any method is popular is often
difficult, except through a near circular argument that it may be what
is most frequently included in teaching and used in research papers.

I suppose most people have met the argument that ordinal scale
measurement (NB "Likert", not likert) doesn't justify taking means,
but how many of them decline as a matter of principle to participate
in taking or using grade-point averages? I've often heard the argument
that e.g. Spearman correlation, not Pearson correlation, is
appropriate for ordinal data, but it rests on calculations of the
means and variances of ranks.

The idea that a binary variable can't be averaged because it is
categorical is also fallacious. Consider a class of 56 women and 44
men and score a gender variable as 1 for female and 0 for male. Then
the mean of 0.56 has an obvious interpretation as the proportion of
females.  I wonder how many statistically-minded people really
consider that the only valid summaries are the median and mode of 1
because the variable is categorical.

To answer your second question more directly, people often say similar
things, but nevertheless it might still be helpful. Statistics, like
any branch of applied mathematics, often entails ignoring purist
assumptions when they aren't important. No variable that we treat as
approximately normal could really be any finite value with some
positive probability.  It's not that, but other behaviour, that limits
the applicability of the normal.

Turn and turn about, if the data are just a few binary variables, then
they are already classified. What you need most is a way of displaying
the frequencies of the 2^k joint categories.

Nick

On Sat, Mar 24, 2012 at 2:04 PM, David Cefskimal
<[email protected]> wrote:

> Stata offers cluster kmeans and cluster kmedian command.
>
> I am wondering why some scholars used cluster kmeans and not cluster
> kmedians to form “attitude clusters” of ordinal scalled (likert type)
> variables?
>
> Second, to me it does not make much sense to have binary coded
> variables included in such a cluster analysis at least if the cluster
> analysis is based on Euclidian distance measure, or not?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: -cluster kmeans- or -cluster kmedian-
  - From: David Cefskimal <[email protected]>

Prev by Date: Re: st: xtabond2 error with stata version (10.0)
Next by Date: st: Clustermat puzzle
Previous by thread: st: -cluster kmeans- or -cluster kmedian-
Next by thread: st: xtabond2 error with stata version (10.0)
Index(es):
- Date
- Thread