# Re: st: Profile Plots of Cluster Solution - How to?

 From khigbee@stata.com To statalist@hsphsun2.harvard.edu Subject Re: st: Profile Plots of Cluster Solution - How to? Date Sat, 23 Nov 2002 09:59:54 -0600

```Tim Victor <tvictor@dolphin.upenn.edu> asks:

> This should be easy but I've been working at it for three days now and
> can't find the solution. What I am trying to do is simply plot the
> profiles for 5 cluster solution. All I want to do is plot each cluster
> mean (and error bar) for each attribute in the same graph. Oddly, doing
> this in SAS is only a few lines after transposing the data:
>
> symbol i=std1mjt;
> proc gplot data=plotme;
>     plot value * attribute = cluster / haxis=axis1 vaxis=axis2 frame;
> run;
>
> Any suggestions? Thanks.

As Nick Cox might say -- what is SAS?

There is not currently (to my knowledge) a single command or two
in Stata that will produce what you want.  However, it can be
done.  Let me outline the steps.  These steps could be combined
up into an ado program if you were doing this kind of thing a
lot.

Step 1 -- obtain the needed data (the means and std deviations or
std errors) in a layout that can be used in Step 2.

Step 2 -- use -serrbar- (or for more fine control use -graph- and
-gph-) to produce the graph

I will illustrate with the auto data and I will be plotting means
and error bars that are +/- 1.96 * std. deviation.  If you want
std.  errors, then alter code below.  Step 0 is to obtain a five
group cluster solution.

Step 0:

use auto, clear
replace disp = disp/20
cluster gen my5 = group(5)

The variable my5 indicates the five groups.  We can view the
data we will want to obtain (the means and std. dev. of the
four variables by the five groups) with:

bysort my5 : summarize head trunk turn disp

Step 1:

There are probably better ways, but here is one way that I
thought of to produce the desired dataset to be used in
graphing.

preserve
foreach var in head trunk turn disp {
statsby "summarize `var'" mean = (r(mean)) sd = (r(sd)) /*
*/ , by(my5) clear
gen str2 name = substr("`var'",1,2)
save mytmp`var' , replace
restore, preserve
}
foreach var in trunk turn disp {
append using mytmp`var'
}
sort name my5
egen namecl = group(name my5) , label
list
save mynew , replace

-statsby- gives us what we want for a single variable.  We
need the results for each of the variables in the cluster
analysis, so we loop over the variables and create little
datasets that we later -append- together.

Step 2:

I will present four alternatives

Alternative 1

serrbar mean sd namecl , scale(1.96) xlab(1/20) ylab

Alternative 2

sort my5 name
serrbar mean sd namecl , scale(1.96) xlab(1/20) ylab c(LII)

Alternative 3

encode name, gen(name2)
sort my5 name2
serrbar mean sd name2, scale(1.96) xlab ylab c(LII)

Alternative 4

gen name3 = name2 + my5/10
serrbar mean sd name3, scale(1.96) xlab ylab c(LII)

Step 3:

restore

After producing the graph we -restore- back to the original
data.

I prefer Alternative 4, but more labeling etc. would be nice.  To
get better control of this, you might need to use -graph- and -gph-.

Ken Higbee    khigbee@stata.com
StataCorp     1-800-STATAPC

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```