Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: cluster analysis validation


From   brendan.halpin@ul.ie (Brendan Halpin)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: cluster analysis validation
Date   Tue, 17 Apr 2012 21:15:34 +0100

On Tue, Apr 17 2012, Dasinger, Lisa wrote:

> I've downloaded your programs, and I will try your suggested solution.
> What is the input for the ari command?  I don't see a help file for it.
> Would it be the dataset variable (old vs. new) and cluster group
> variable (with values 1..6)?  Also, I see that the permtab command seems
> to require a square matrix, but I would have a 6 x 2 matrix.

Both -ari- and -permtab- compare two classifications of the same size
(number of categories). If your two cluster solutions are o6 and n6

. ari o6 n6

will calculate the Adjusted Rand Index for the comparison, and

. permtab o6 n6 

will rearrange n6 so that it agrees as much as possible with o6, and
then cross-tabulate them. If you do -permtab o6 n6, newvar(p6)- it will
create in p6 a copy of the rearranged n6. 

Both of those commands serve to compare the old and the joint
classifications, if you wish to do that. My initial suggestion (where
the 6x2 table comes into it) is to compare the distribution of the two
waves across the joint classification. In the simplest sense, this
means examining the percentages within wave, but you could extend it to,
say, a multinomial logistic regression (with the cluster solution as the
dependent variable) and wave as one of the explanatory variables. 

If you want more info on the Adjusted Rand Index, there are some notes
in the form of comments in the ari.ado file -- its presence in the
package was something of an afterthought, so I never set up a help file.
ARI works in terms of pairs: if every possible pair of observations that
have the same value in one classification have the same value in the
other, and every pair that has different values in one has different
values in the other, the agreement is perfect and ARI is 1.0. Otherwise
the index is less than one.


Regards,

Brendan

PS: Note that since permtab permutes one of the classifications, its
runtime rises factorially, to the extent that it is useless from about
10 categories up. -permtabga- estimamtes an approximate solution for
larger classifications. 
-- 
Brendan Halpin,   Department of Sociology,   University of Limerick,   Ireland
Tel: w +353-61-213147  f +353-61-202569  h +353-61-338562;  Room F1-009 x 3147
mailto:brendan.halpin@ul.ie    ULSociology on Facebook: http://on.fb.me/fjIK9t
http://teaching.sociology.ul.ie/bhalpin/wordpress         twitter:@ULSociology
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index