[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: are there any statistics rules that I can apply to separate numbers into groups? |

Date |
Thu, 12 Mar 2009 12:41:34 -0000 |

The fact that you get the same results with -group1d- and a k-means approach is good fortune, as k-means methods don't guarantee that an optimum will be found. The main point of -group1d- is that it produces classes that are contiguous intervals in one dimension. In contrast -cluster- has no notion of contiguity. Your main question is about -cluster- and is best left to Ken Higbee, I suspect. Nick n.j.cox@durham.ac.uk Ada Ma Thanks to Nick for introducing me to this wonderful command -group1d-. It's exactly what I was looking for. I have some further questions - which I hope someone would help me to understand. I was also playing around with the -cluster kmeans- command and find that -group1d- generates the same groupings -cluster kmeans- with the option -measure(L2squared)- applied. I then compare the results of -cluster kmeans- with or without the -measure(L2squared)- option specified. The result groupings are different. I don't really understand why this should be the case for univariate clustering, because when I typed: help measure_option (note the underscore between the words measure and option, without the underscore a different help file will show up) It is explained that the default option calculates the grouping by minimising: requests the Euclidean distance / Minkowski distance metric with argument 2 sqrt(sum((x_ia - x_ja)^2)) But when the option -measure(L2squared)- is specified grouping is assigned by minimising the square of the Euclidean distance / Minkowski distance metric with argument 2 sum((x_ia - x_ja)^2) Here are some output generated using the same 49 observations: . cluster kmeans var1, k(4) generate(euclid) cluster name: _clus_5 . cluster kmeans var1, k(4) generate(euclidsq) measure(L2squared) cluster name: _clus_1 . tab euclid euclidsq | euclidsq euclid | 1 2 3 4 | Total -----------+--------------------------------------------+---------- 1 | 10 0 0 0 | 10 2 | 0 0 12 0 | 12 3 | 0 4 0 6 | 10 4 | 9 0 0 8 | 17 -----------+--------------------------------------------+---------- Total | 19 4 12 14 | 49 . bys euclid: egen m_euclid=mean(var1) . bys euclidsq: egen m_euclidsq=mean(var1) . egen tot1euclid=total((var1-m_euclid)^2) . egen tot1euclidsq=total((var1-m_euclidsq)^2) . sum tot* Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- tot1euclid | 49 712.2434 0 712.2434 712.2434 tot1euclidsq | 49 524.9169 0 524.9169 524.9169 . di sqrt(712.2434 ) 26.687889 . di sqrt( 524.9169 ) 22.911065 Groupings generated with the option -measure(L2squared)- applied is superior to the one without. This shouldn't be the case for univariate clustering, or should it?? Have I missed something important? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: are there any statistics rules that I can apply to separate numbers into groups?***From:*Ada Ma <heu034@googlemail.com>

**Re: st: are there any statistics rules that I can apply to separate numbers into groups?***From:*Partha Deb <partha.deb@hunter.cuny.edu>

**Re: st: are there any statistics rules that I can apply to separate numbers into groups?***From:*"Kyle K. Hood" <kyle.hood@yale.edu>

**Re: st: are there any statistics rules that I can apply to separate numbers into groups?***From:*Ada Ma <heu034@googlemail.com>

**RE: st: are there any statistics rules that I can apply to separate numbers into groups?***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: are there any statistics rules that I can apply to separate numbers into groups?***From:*Ada Ma <heu034@googlemail.com>

- Prev by Date:
**st: AW: RE: Re: Generate Variable** - Next by Date:
**AW: st: Re: save coeff estimators of Arima model with foreach** - Previous by thread:
**Re: st: are there any statistics rules that I can apply to separate numbers into groups?** - Next by thread:
**st: Re: stata10 xtabond** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |