Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Cluster, new dissimilarity measures, and sequence analysis

From   Matissa Hollister <>
Subject   st: Cluster, new dissimilarity measures, and sequence analysis
Date   Wed, 21 Jul 2004 12:11:19 -0700 (PDT)

Im hoping someone can help me solve this problem,
although Iím beginning to think that itís hopeless. 
Basically Iíve created my own special measure of
dissimilarity that I want to use for clustering, but
Iím finding that there is no way to get Stata to allow
me to use this new dissimilarity measure.  Any ideas
of ways to get around this problem would be greatly

Basically, I am using a procedure called Optimal
Matching, an algorithm designed to create a measure of
dissimilarity between two sequences of data.  I am
using it to identify people who have similar career
patterns.  Iíve created a do-file that accomplishes
the most difficult and unusual part of Optimal
Matching, which is creating the measure of
dissimilarity between each pair of sequences.  I now
want to run a clustering procedure to identify groups
based upon this dissimilarity measure.

I found a post in the listserv archives (dated
November 18, 2002) where someone wanted to do
something similar (she wanted to create a geographic
distance measure).  From the response I gather the
calling and running of the dissimilarity algorithms
occurs within the built-in stata command _cluster and
is done within C, which is certainly beyond my
programming abilities.  Iíve contemplated several
possibilities and would love help or advice on any of

1)find a different software program that will allow me
to easily input a new dissimilarity measure into a
cluster command (preferably not expensive)

2)a way to alter Stataís cluster command to allow for
this new dissimilarity measure

3)a way to get around this problem, e.g.:

   A.use the ParseDist command within cluster.ado to
somehow cause the built-in command to call up a
different distance command

   B.ways to enter the data so that a built-in Stata
dissimilarity measure will result in the same pairwise
distances (difficult because the pairwise
dissimilarities make up a multi-dimensional space, the
whole point is that they are difficult to summarize in
a few variables)

4)	write my own clustering procedure

Please!  Any help would be gratefully accepted.  I
know that several other researchers have already used
Optimal Matching with clustering, so my guess is that
option #1 might be the most viable one, but Iím not
sure where to look.



Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index