Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Initial grouping variable in Kmeans/Kmedians


From   khigbee@stata.com
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Initial grouping variable in Kmeans/Kmedians
Date   Mon, 27 Jun 2005 09:52:26 -0500

Rob Hall <rob@environmetrics.com.au> asks concerning -cluster kmeans-
or -cluster kmedians-:

> In SPSS, I can use a matrix of cluster centres as the starting point  
> for analysis. For example, I eight variables that will be used in the  
> analysis and I have "target" means for each cluster on each variable.
> How can I get Stata to attend to the eight by 'n' matrix rather than  
> just a single grouping variable?

Look at the -start()- option, and in particular -start(lastk, exclude)-.
The approach would be to append the target means to the data.  -exclude-
indicates that these appended data points are not to be clustered, but
instead are only to act as starting center points for the algorithm.

Let's say that the Stata matrix holding the starting points is X and
that you have 8 variables (named a1, a2, a3, ..., a8) and are
clustering to 10 groups and that your dataset has 1000 observations.
Here is one approach:

  set the number of observations to 10 (the # of groups you desire)
  more than the current number of observations.

      set obs 1010

  The newly created observations hold missing values until you fill
  them with something else.  We want to place the values that are
  in the X matrix into those last observations.

      forvalues i = 1/8 {
          forvalues j = 1/10 {
              local k = 1000 + `j'
              replace a`i' = X[`j',`i'] in `k'
          }
      }

  If you are new to Stata be careful to notice that I am using a left
  single quote and a right single quote (different characters) in the
  quoting around the i, j, and k in the code above.

  Now I can call -cluster kmeans-

      cluster kmeans ... , k(10) ... start(lastk, exclude)

  After the cluster analysis you might wish to remove the bottom
  observations you added

      drop in 1001/1010


There are other approaches besides using the -forvalues- loops for
getting the starting center information from a matrix to the bottom
of your dataset.  For instance you might use something like

    preserve
    drop _all
    svmat ...
    save ...
    restore
    append using ...


Ken Higbee    khigbee@stata.com
StataCorp     1-800-STATAPC

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index