# Re: st: Initial grouping variable in Kmeans/Kmedians

 From [email protected] To [email protected] Subject Re: st: Initial grouping variable in Kmeans/Kmedians Date Mon, 27 Jun 2005 09:52:26 -0500

```Rob Hall <[email protected]> asks concerning -cluster kmeans-
or -cluster kmedians-:

> In SPSS, I can use a matrix of cluster centres as the starting point
> for analysis. For example, I eight variables that will be used in the
> analysis and I have "target" means for each cluster on each variable.
> How can I get Stata to attend to the eight by 'n' matrix rather than
> just a single grouping variable?

Look at the -start()- option, and in particular -start(lastk, exclude)-.
The approach would be to append the target means to the data.  -exclude-
indicates that these appended data points are not to be clustered, but
instead are only to act as starting center points for the algorithm.

Let's say that the Stata matrix holding the starting points is X and
that you have 8 variables (named a1, a2, a3, ..., a8) and are
clustering to 10 groups and that your dataset has 1000 observations.
Here is one approach:

set the number of observations to 10 (the # of groups you desire)
more than the current number of observations.

set obs 1010

The newly created observations hold missing values until you fill
them with something else.  We want to place the values that are
in the X matrix into those last observations.

forvalues i = 1/8 {
forvalues j = 1/10 {
local k = 1000 + `j'
replace a`i' = X[`j',`i'] in `k'
}
}

If you are new to Stata be careful to notice that I am using a left
single quote and a right single quote (different characters) in the
quoting around the i, j, and k in the code above.

Now I can call -cluster kmeans-

cluster kmeans ... , k(10) ... start(lastk, exclude)

After the cluster analysis you might wish to remove the bottom

drop in 1001/1010

There are other approaches besides using the -forvalues- loops for
getting the starting center information from a matrix to the bottom
of your dataset.  For instance you might use something like

preserve
drop _all
svmat ...
save ...
restore
append using ...

Ken Higbee    [email protected]
StataCorp     1-800-STATAPC

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```