Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: joining strings


From   Kevin McConeghy <kevinmcconeghy@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: joining strings
Date   Wed, 28 Nov 2012 16:50:04 -0600

thanks both ways work fine :)

Kevin

On Wed, Nov 28, 2012 at 4:34 PM, William Gould, StataCorp LP
<wgould@stata.com> wrote:
> Kevin McConeghy <kevinmcconeghy@gmail.com> asked,
>
>> I have a dataset with ~2.2 mill obs like so:
>>
>>       id        stringvar    + other variables
>>        1          x
>>        1          y
>>        1          z
>>        2          a
>>        3          d
>>        4          g
>>        4          h
>>
>> [...]
>> I was trying to combine the stringvar to collapse and make id a unique
>> key, like so:
>>
>>       id        stringvar
>>        1          xyz
>>        2          a
>>        3          d
>>        4          gh
>>
>> [...] [-reshape- ran out of memory] [...]
>>
>>
>> Is there some way to skip the reshape step [...]?
>
> Here is my solution.  First, let me set up the toy problem,
>
>         . clear all
>
>         . input id str1 stringvar
>
>                     id  stringvar
>           1. 1 x
>           2. 1 y
>           3. 1 z
>           4. 2 a
>           5. 3 d
>           6. 4 g
>           7. 4 h
>           8. end
>
> My solution is,
>
>         . sort id
>         . gen str result = ""
>         . by id: replace result = result[_n-1] + stringvar
>         . by id: keep if _n==_N
>
> Below I run that, with a few -list-s added:
>
>         . sort id
>
>         . gen str result = ""
>         (7 missing values generated)
>
>         . by id: replace result = result[_n-1] + stringvar
>         (7 real changes made)
>
>         . list
>
>              +------------------------+
>              | id   string~r   result |
>              |------------------------|
>           1. |  1          x        x |
>           2. |  1          y       xy |
>           3. |  1          z      xyz |
>           4. |  2          a        a |
>           5. |  3          d        d |
>              |------------------------|
>           6. |  4          g        g |
>           7. |  4          h       gh |
>              +------------------------+
>
>         . by id: keep if _n==_N
>         (3 observations deleted)
>
>         . list
>
>              +------------------------+
>              | id   string~r   result |
>              |------------------------|
>           1. |  1          z      xyz |
>           2. |  2          a        a |
>           3. |  3          d        d |
>           4. |  4          h       gh |
>              +------------------------+
>
>
> In my solution,
>
>         . sort id
>         . gen str result = ""
>         . by id: replace result = result[_n-1] + stringvar
>         . by id: keep if _n==_N
>
> watch out for the first line, -sort id-.  It should really read,
>
>         . sort id some_other_variable
>
> We need to specify the order within equal values of id to make
> the the order of the letters deterministic.  Perhaps Kevin want
> the letters is alphabetical order, in which case -sort id- should
> change to -sort id stringvar-.
>
> -- Bill
> wgould@stata.com
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Kevin McConeghy, PharmD
Infectious Diseases Fellow
University of Illinois College of Pharmacy
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index