Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Profile Plots of Cluster Solution - How to?

Subject   Re: st: Profile Plots of Cluster Solution - How to?
Date   Sat, 23 Nov 2002 09:59:54 -0600

Tim Victor <> asks:

> This should be easy but I've been working at it for three days now and 
> can't find the solution. What I am trying to do is simply plot the 
> profiles for 5 cluster solution. All I want to do is plot each cluster 
> mean (and error bar) for each attribute in the same graph. Oddly, doing 
> this in SAS is only a few lines after transposing the data:
> symbol i=std1mjt;
> proc gplot data=plotme;
>     plot value * attribute = cluster / haxis=axis1 vaxis=axis2 frame;
> run;
> Any suggestions? Thanks.

As Nick Cox might say -- what is SAS?

There is not currently (to my knowledge) a single command or two
in Stata that will produce what you want.  However, it can be
done.  Let me outline the steps.  These steps could be combined
up into an ado program if you were doing this kind of thing a

Step 1 -- obtain the needed data (the means and std deviations or
          std errors) in a layout that can be used in Step 2.

Step 2 -- use -serrbar- (or for more fine control use -graph- and
          -gph-) to produce the graph

I will illustrate with the auto data and I will be plotting means
and error bars that are +/- 1.96 * std. deviation.  If you want
std.  errors, then alter code below.  Step 0 is to obtain a five
group cluster solution.

Step 0:

        use auto, clear
        keep head trunk turn disp
        replace disp = disp/20
        cluster completelink head trunk turn disp, name(mycl)
        cluster gen my5 = group(5)

    The variable my5 indicates the five groups.  We can view the
    data we will want to obtain (the means and std. dev. of the
    four variables by the five groups) with:

        bysort my5 : summarize head trunk turn disp

Step 1:

    There are probably better ways, but here is one way that I
    thought of to produce the desired dataset to be used in

        foreach var in head trunk turn disp {
            statsby "summarize `var'" mean = (r(mean)) sd = (r(sd)) /*
                */ , by(my5) clear
            gen str2 name = substr("`var'",1,2)
            save mytmp`var' , replace
            restore, preserve
        use mytmphead , clear
        foreach var in trunk turn disp {
                append using mytmp`var'
        sort name my5
        egen namecl = group(name my5) , label
        save mynew , replace

    -statsby- gives us what we want for a single variable.  We
    need the results for each of the variables in the cluster
    analysis, so we loop over the variables and create little
    datasets that we later -append- together.

Step 2:

    I will present four alternatives

    Alternative 1

        serrbar mean sd namecl , scale(1.96) xlab(1/20) ylab

    Alternative 2

        sort my5 name
        serrbar mean sd namecl , scale(1.96) xlab(1/20) ylab c(LII)

    Alternative 3

        encode name, gen(name2)
        sort my5 name2
        serrbar mean sd name2, scale(1.96) xlab ylab c(LII)

    Alternative 4

        gen name3 = name2 + my5/10
        serrbar mean sd name3, scale(1.96) xlab ylab c(LII) 

Step 3:


    After producing the graph we -restore- back to the original

I prefer Alternative 4, but more labeling etc. would be nice.  To
get better control of this, you might need to use -graph- and -gph-.

Ken Higbee
StataCorp     1-800-STATAPC

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index