Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"William Gould, StataCorp LP" <wgould@stata.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: joining strings |

Date |
Wed, 28 Nov 2012 16:34:12 -0600 |

Kevin McConeghy <kevinmcconeghy@gmail.com> asked, > I have a dataset with ~2.2 mill obs like so: > > id stringvar + other variables > 1 x > 1 y > 1 z > 2 a > 3 d > 4 g > 4 h > > [...] > I was trying to combine the stringvar to collapse and make id a unique > key, like so: > > id stringvar > 1 xyz > 2 a > 3 d > 4 gh > > [...] [-reshape- ran out of memory] [...] > > > Is there some way to skip the reshape step [...]? Here is my solution. First, let me set up the toy problem, . clear all . input id str1 stringvar id stringvar 1. 1 x 2. 1 y 3. 1 z 4. 2 a 5. 3 d 6. 4 g 7. 4 h 8. end My solution is, . sort id . gen str result = "" . by id: replace result = result[_n-1] + stringvar . by id: keep if _n==_N Below I run that, with a few -list-s added: . sort id . gen str result = "" (7 missing values generated) . by id: replace result = result[_n-1] + stringvar (7 real changes made) . list +------------------------+ | id string~r result | |------------------------| 1. | 1 x x | 2. | 1 y xy | 3. | 1 z xyz | 4. | 2 a a | 5. | 3 d d | |------------------------| 6. | 4 g g | 7. | 4 h gh | +------------------------+ . by id: keep if _n==_N (3 observations deleted) . list +------------------------+ | id string~r result | |------------------------| 1. | 1 z xyz | 2. | 2 a a | 3. | 3 d d | 4. | 4 h gh | +------------------------+ In my solution, . sort id . gen str result = "" . by id: replace result = result[_n-1] + stringvar . by id: keep if _n==_N watch out for the first line, -sort id-. It should really read, . sort id some_other_variable We need to specify the order within equal values of id to make the the order of the letters deterministic. Perhaps Kevin want the letters is alphabetical order, in which case -sort id- should change to -sort id stringvar-. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: joining strings***From:*Kevin McConeghy <kevinmcconeghy@gmail.com>

- Prev by Date:
**st: Saving variable order to dofile** - Next by Date:
**st: interaction notation for lincom (and test)** - Previous by thread:
**Re: st: joining strings** - Next by thread:
**Re: st: joining strings** - Index(es):