[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
khigbee@stata.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: Re: st: are there any statistics rules that I can apply to separate numbers into groups? |

Date |
Thu, 12 Mar 2009 08:50:51 -0500 |

Ada Ma <heu034@googlemail.com> asks: > ... I was also playing around with the -cluster kmeans- > command and find that -group1d- generates the same groupings -cluster > kmeans- with the option -measure(L2squared)- applied. Nick Cox <n.j.cox@durham.ac.uk> responds: >> The fact that you get the same results with -group1d- and a k-means >> approach is good fortune, as k-means methods don't guarantee that an >> optimum will be found. Ada's question continues: > I then compare the results of -cluster kmeans- with or without the > -measure(L2squared)- option specified. The result groupings are > different. I don't really understand why this should be the case for > univariate clustering, because when I typed: > > help measure_option (note the underscore between the words measure > and option, without the underscore a different help file will show up) > > It is explained that the default option calculates the grouping by minimising: > requests the Euclidean distance / Minkowski distance metric > with argument 2 > > sqrt(sum((x_ia - x_ja)^2)) > > But when the option -measure(L2squared)- is specified > grouping is assigned by minimising the square of the Euclidean > distance / Minkowski distance metric with argument 2 > > sum((x_ia - x_ja)^2) > > Here are some output generated using the same 49 observations: > > ... <output omitted> ... Look at the -start()- option detailed in -help cluster_kmeans-, notice that many of the suboptions of -start()- take a random number seed value as an argument, including the default -start(krandom())-. As Nick pointed out, it was just luck that your first run of -cluster kmeans- produced the same clustering as -group1d-. Set the random number seed to different values before several runs and you might get several different answers. Kmeans clustering does not guarantee to find an optimal solution. Quoting Nick Cox's answer: >> The main point of -group1d- is that it produces classes that are >> contiguous intervals in one dimension. In contrast -cluster- has >> no notion of contiguity. Kmeans clustering can be applied to any number of dimensions. The case of having only 1 dimension is not given any special treatment. Side note: Ada indicates that -help measure_option- and -help measure option- display different help files. I can not reproduce that behavior. It displays the same help file for me. Ada can you reproduce that behavior? If so email me and tell me more about your setup (send me the output of typing -about- and -update query- in your Stata). Ken Higbee khigbee@stata.com StataCorp 1-800-STATAPC * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Re: Excuse me, just a question about chi-square test.** - Next by Date:
**st: Problem importing data** - Previous by thread:
**st: Re: Excuse me, just a question about chi-square test.** - Next by thread:
**st: Problem importing data** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |