Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Update of package for Sequence Analysis


From   Ulrich Kohler <kohler@wzb.eu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Update of package for Sequence Analysis
Date   Mon, 03 Sep 2007 11:31:12 +0200

Thanks to Kit Baum, a new version of the SQ-ados is available on SSC.
The programs can be installed by 

. ssc install sq

Stata users, who have already installed a previous version of the
SQ-Ados are asked to use

. adoupdate sq, update

The SQ-Ados are various programs written to analyse sequence data. The
techniques and programs are described in some detail in
Brzinky-Fay/Kohler/Luniak "Sequence analysis with Stata". The new
release include sevaral bug fixes as well as new features. The new
features since our last update are: 

(1) sqom: The program for the Needleman-Wunsch Algorithm has three new
options -meanprobdistance-, -minprobdistance-, and -maxprobdistance- to
define a matrix for the substitution-costs from the dataset.
Substitution costs are calculated from the transition' probabilities (p)
between every two neighboring elements of the sequences. Check out -help
sqom- for details.

(2) sqom: -sqom- now confirms that a user supplied substitution cost
matrix is symmetric and issues an error message if it is not the case. 

(3) egen-sqfirstpos(): The new egen function -sqfirstpos()- returns the
postion at which a specific pattern within a sequence was first found. 
Check out -help sqegen- for details.

(4) egen-sqallpos(): The new egen function -sqallpos()- returns the
numbers of occurences of a specific pattern within a sequence. 
Check out -help sqegen- for details.

Aside: -sqfirstpos()- and -sqallpos()-  use a Mata implementation of the
Boyer-Moore algorithm. The source code of our implementation can be
found in lsqbm.mata. 

(5) sqindexplot: The program for producing sequence index plots now
allow a variable list in the option -order()-, which makes fine-tuning
of the sort order of the graph much easier.


In addition, here is a list of features that have been made in the
several updates between the Stata Journal entry and this new release:

- All programs now have the option "subsequence(a b)". The option allows
to restrict an analysis on a subsequence from postion a to position b.

- egen-sqfreq: New egen-function to generate  the frequency of the
respective sequence-type. See -help sqegen-

- egen-sqrank: New egen-function to generate a bariable holding the rank
order of frequency of respective sequence-type

- sqclusterdat: New option "keep(varlist)" added. See -help
sqclusterdat-

- sqindexplot: Some new default settings 

- sqom: New option -idealtype- allows to specify an ideal-typical
sequence, which all sequences are compared with. 


We like to thank Mark Kaulisch, Irena Kogan, Chung Ip, Anna Manzoni,
Trent Spaulding  and Abhirup Chakrabart for bug reports and comments.


Many regards

Ulrich Kohler -kohler@wzb.eu-
Magdalena Luniak -luniak@wzb.eu-



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index