Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: -cluster kmeans- or -cluster kmedian- |

Date |
Sat, 24 Mar 2012 17:22:45 +0000 |

How far measurement scale implies what statistical methods are or are not "valid" (or even useful) is a messy mix of impure logic, pragmatism and pure prejudice. On your first question, explaining why any method is popular is often difficult, except through a near circular argument that it may be what is most frequently included in teaching and used in research papers. I suppose most people have met the argument that ordinal scale measurement (NB "Likert", not likert) doesn't justify taking means, but how many of them decline as a matter of principle to participate in taking or using grade-point averages? I've often heard the argument that e.g. Spearman correlation, not Pearson correlation, is appropriate for ordinal data, but it rests on calculations of the means and variances of ranks. The idea that a binary variable can't be averaged because it is categorical is also fallacious. Consider a class of 56 women and 44 men and score a gender variable as 1 for female and 0 for male. Then the mean of 0.56 has an obvious interpretation as the proportion of females. I wonder how many statistically-minded people really consider that the only valid summaries are the median and mode of 1 because the variable is categorical. To answer your second question more directly, people often say similar things, but nevertheless it might still be helpful. Statistics, like any branch of applied mathematics, often entails ignoring purist assumptions when they aren't important. No variable that we treat as approximately normal could really be any finite value with some positive probability. It's not that, but other behaviour, that limits the applicability of the normal. Turn and turn about, if the data are just a few binary variables, then they are already classified. What you need most is a way of displaying the frequencies of the 2^k joint categories. Nick On Sat, Mar 24, 2012 at 2:04 PM, David Cefskimal <david.cefskimal@googlemail.com> wrote: > Stata offers cluster kmeans and cluster kmedian command. > > I am wondering why some scholars used cluster kmeans and not cluster > kmedians to form “attitude clusters” of ordinal scalled (likert type) > variables? > > Second, to me it does not make much sense to have binary coded > variables included in such a cluster analysis at least if the cluster > analysis is based on Euclidian distance measure, or not? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: -cluster kmeans- or -cluster kmedian-***From:*David Cefskimal <david.cefskimal@googlemail.com>

- Prev by Date:
**Re: st: xtabond2 error with stata version (10.0)** - Next by Date:
**st: Clustermat puzzle** - Previous by thread:
**st: -cluster kmeans- or -cluster kmedian-** - Next by thread:
**st: xtabond2 error with stata version (10.0)** - Index(es):