[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: clustering on string similarity

From	Dan Weitzenfeld <[email protected]>
To	[email protected]
Subject	st: clustering on string similarity
Date	Fri, 1 May 2009 12:17:50 -0700

Hi Folks,
In working with eye-tracking data, a person's sequence of
areas-of-interest viewed (a "scan pattern") are often represented as
strings.  E.g., my scan-pattern in the first 5 seconds of looking at a
webpage might be PPHM, with P = picture, H = headline, and M = side
menu.
For this reason, I am interested in clustering on string similarity,
to identify commonly-taken scan patterns in a dataset.
It looks like my best bet is to create a dissimilarity matrix (using
Levenshtein distance has the dissimilarity measure) and then use
-clustermat-.

My questions are:
   -are there any packages out there that would make this easier?
   -am I right that I will have to write a program to make the matrix?
 I'm fine with writing it, I just want to confirm that I'm not missing
an easier way to do this.

Thanks in advance,
Dan

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: clustering on string similarity
  - From: Dan Weitzenfeld <[email protected]>

Prev by Date: st: Re: multiple imputation and survival analysis
Next by Date: st: General Advice on robust standard errors for event study models with multiple dummy variables
Previous by thread: st: multiple imputation and survival analysis
Next by thread: Re: st: clustering on string similarity
Index(es):
- Date
- Thread