Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: identifying highest number of consecutive variables where answer is consistent across observation


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: identifying highest number of consecutive variables where answer is consistent across observation
Date   Thu, 20 Feb 2014 18:51:16 +0000

Not quite. You'd need to -encode- first. Revised sketch, with another
simplification. (Every spell of length 16 or greater necessarily has a
15th value.)

gen id = _n
reshape long var, i(id) j(question)
tsset id question
ssc inst tsspell
encode var, gen(nvar)
tsspell nvar
egen fifteen_or_more = total(_seq == 15), by(id)

Nick
njcoxstata@gmail.com


On 20 February 2014 18:34, Nick Cox <njcoxstata@gmail.com> wrote:
> Joe Canner has developed a good strategy for looking at this. Here is another.
>
> Suppose we -reshape long-, something like
>
> gen id = _n
> reshape long var, i(id) j(question)
> tsset id question
>
> Then we can treat the blocks of observations as panel data. With
>
> ssc inst tsspell
> tsspell var
>
> With this syntax for -tsspell- a "spell" is automatically a sequence
> of identical values. The existence of spells 15 or longer will be
> summarized by
>
> egen fifteen_or_more = total((_seq >= 15) / _end), by(id)
>
> where division by the indicator variable -_end- (1 on end of spell, 0
> otherwise) ensures that we look only at the ends of spells. If needed,
> we can then -reshape- back.
>
> On the other hand, it is quite likely that some questions of similar
> kind are more easily answered with this data structure.
>
> Nick
> njcoxstata@gmail.com
>
>
> On 20 February 2014 17:04, Alison El Ayadi <aelayadi@gmail.com> wrote:
>
>> I am doing some data cleaning on survey data and am looking to
>> identify observations where there are 15 or more of the same answers
>> in a row (across the variables in current order).  All of the
>> variables are string.  Does anyone have an easy automated way to do
>> this?  I'm thinking that it could be done by generating a variable
>> that provided the maximum number of same responses in a row, but have
>> no idea how to code this.  Variables are q1 - q94, and all string.
>>
>> Any suggestions on efficiently writing this code would be greatly appreciated.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index