Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: sequence analysis

 From Ulrich Kohler To statalist@hsphsun2.harvard.edu Subject Re: st: sequence analysis Date Thu, 05 Jan 2012 14:06:50 +0100

```Am Donnerstag, den 05.01.2012, 20:09 +1000 schrieb Melanie Spallek:
> Hi, I'm trying to do a sequence analysis using housing trajectories (tenure status). I'm using sqset tenure id wave, trim to define my sequences. Followed by seqtab, so, I receive a frequency table of the sequences, however the so option treats identically all sequences that have the same order of elements; i.e., the sequence A-B-B-A would be treated the same as A-B-A-A, which is exactly what I want.  My first (and most important) question is if and how I can save those sequences in a variable. Since my dataset is in long format, I'm aware that by having ten waves for each id, the 'sequence variable' will be occurring ten times, which is ok.

Brendan Halpin has answered the question.

Apart from that, have you checked the sq-egen function -sqfreq()-? It
generates a variable holding the frequencies reported by -sqtab-. This
could be helpful in case that the end-goal of this is exercise ist to
get those frequencies as a variable.

If you want to create a "SO"-sequence dataset this could be done as
follows: Starting from a sequence dataset in long format such as

. reshape long st, i(id) j(order)
. list, sepby(id)

you could type

. by id (order), sort: gen first = st!=st[_n-1]
. keep if first

so that you arrive at:

. list, sepby(id)

You can then create a new order variable an sq-reset your data:

. by id(order): gen soorder = _n
. sqset

> My second question is the following. I have used the keeplongest option to define the sequences, and when every time I run exactly the same code followed by seqtab, I get slightly different results. I'm thinking that this might be due to STATA randomly selecting  different consecutive sequences, eg, A-A-A-.-.-.-B-B-B, so if I run it once, keeplongest might select the A-A-A sequence, and the next time it might select the B-B-B sequence, as both are the same length, and hence I get slightly different frequencies for the A-A-A and B-B-B sequencies....Has anybody got any other explanation???

No. That explanation is quite correct. I have sent a fix to Kit Baum
that corrects this. The correction is such that the last block of all
blocks of equal length is allways retained. If I find time, I will let
the user specify whether it is the first, the last or a random
selection.

Uli

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```