Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: identify unique string values within lists of elements over chosen time windows


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: identify unique string values within lists of elements over chosen time windows
Date   Fri, 22 Mar 2013 11:55:23 +0000

is the Stack Overflow thread alluded to.

http://www.stata.com/support/faqs/data-management/problems-with-reshape/

is the FAQ alluded to.

On Fri, Mar 22, 2013 at 11:46 AM, Denisa Mindruta <mdenisa@yahoo.com> wrote:
> Dear Nick- this has been a very helpful conversation ! For anyone else
> potentially interested in this posting.
>
>
> Another solution proposed by Dimitriy on stackoverflow was to use:
> collapse (sum) new=n, by(obs year)  after creating the indicator counting the
> first occurrence of a string value. But Dimitriy's solution requires the
> additional step of  merging the new variable back into the original dataset....
> I also asked Nick whether reshaping is the most "efficient" way of approaching
> the issue and here is what he said. I quote Nick:
>
> "(MORE) Further comments focused largely on efficiency, meaning here speed
> rather than space. (Storage space could be biting the poster.)
>
>
> Without a restructure, here using reshape, the problem  is a triple loop: over
> identifiers, over observations for each  identifier and over variables. Possibly
> the two outer loops can be  collapsed to one. But an explicit loop over
> observations is usually slow  in Stata.
>
>
> With the restructuring solutions proposed by Dimitriy and myself, by: operations
> go straight to compiled code and are relatively fast: reshape is interpreted
> code and entails file manipulations, so can be slow. On the other hand reshape
> can be fast to write down with some experience, and it really is worth acquiring
> the fluency with reshape which comes with experience. In addition to the help
> for reshape and the manual entry, see the FAQ on reshape I wrote on
>
> www.stata.com.
>
> Another consideration is what else you want to do with this kind of  dataset. If
> there are going to be other problems of similar character,  they will usually be
> easier with a long structure as produced by reshape, so keeping that structure
> will be a good idea."
>
>
>
>
> ----- Original Message ----
> From: Nick Cox <njcoxstata@gmail.com>
> To: statalist@hsphsun2.harvard.edu
> Sent: Fri, March 22, 2013 4:27:35 AM
> Subject: Re: st: identify unique string values within lists of elements over
> chosen time windows
>
> clear
> input obs     yr   str4 var1 str4  var2 str4   var3
> 1        90   str1    str2    str3
> 1        91    str1    str4    str5
> 2        90    str3    str4
> 2        91    str4    str5
> 2        93    str3    str5
> 2        94    str7
> end
> reshape long var , i(obs yr) j(which)
> bysort obs var (yr) : gen new = _n == 1 & !missing(var)
> bysort obs yr : replace new = sum(new)
> by obs yr : replace new = new[_N]
> reshape wide var, i(obs yr) j(which)
>
> Nick
>
> On Thu, Mar 21, 2013 at 11:22 PM, Denisa Mindruta <mdenisa@yahoo.com> wrote:
>> Hi everyone. I have an unbalanced, large panel dataset, where each observation
>> can take multiple string values (each string is stored in a separate
> variable).
>> At each point in time, I need to count whether the string value(s) taken by an
>> observation are "new" , meaning that they do not show up among the values
> taken
>> by the same observation in previous years. How should I approach  this problem
>
>
>>?
>> Thanks !  Below is a description of data. I need to calculate newval
>>
>> obs     yr   var1    var2    var3    newval
>> 1        90   str1    str2    str3     3
>> 1        91    str1    str4    str5     2
>> 2        90    str3    str4              2
>> 2        91    str4    str5              1
>> 2        93    str3    str5              0
>> 2        94    str7                       1
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index