# st: RE: RE: sort strings within rows

 From "Nick Cox" <[email protected]> To <[email protected]> Subject st: RE: RE: sort strings within rows Date Wed, 6 Sep 2006 11:56:01 +0100

```I don't see that -reshape- is problematic here.

Let's add some other variables to show that they don't
mess the problem unduly:

. l

+----------------------------------------------------------------+
|           tvp1            tvp2             tvp3   prime    pet |
|----------------------------------------------------------------|
1. |  massagli,mark         wood,j.   dessent,harold       2   frog |
2. |  beletz,elaine    carter,annie   curtis,barbara       3   toad |
3. |   bradshaw,joe    brown,arnold   dunaway,lowell       5   newt |
4. | schneider,mark   mullins,bobby    sump,lawrence       7   wolf |
+----------------------------------------------------------------+

Create a unique identifier if one doesn't exist and then -reshape-

. gen id = _n

. reshape long tvp , i(id)
(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        4   ->      12
Number of variables                   6   ->       5
j variable (3 values)                     ->   _j
xij variables:
tvp1 tvp2 tvp3   ->   tvp
-----------------------------------------------------------------------------

. l

+-----------------------------------------+
| id   _j              tvp   prime    pet |
|-----------------------------------------|
1. |  1    1    massagli,mark       2   frog |
2. |  1    2          wood,j.       2   frog |
3. |  1    3   dessent,harold       2   frog |
4. |  2    1    beletz,elaine       3   toad |
5. |  2    2     carter,annie       3   toad |
|-----------------------------------------|
6. |  2    3   curtis,barbara       3   toad |
7. |  3    1     bradshaw,joe       5   newt |
8. |  3    2     brown,arnold       5   newt |
9. |  3    3   dunaway,lowell       5   newt |
10. |  4    1   schneider,mark       7   wolf |
|-----------------------------------------|
11. |  4    2    mullins,bobby       7   wolf |
12. |  4    3    sump,lawrence       7   wolf |
+-----------------------------------------+

Get a new column number:

. bysort id (tvp) : replace _j = _n

. l

+-----------------------------------------+
| id   _j              tvp   prime    pet |
|-----------------------------------------|
1. |  1    1   dessent,harold       2   frog |
2. |  1    2    massagli,mark       2   frog |
3. |  1    3          wood,j.       2   frog |
4. |  2    1    beletz,elaine       3   toad |
5. |  2    2     carter,annie       3   toad |
|-----------------------------------------|
6. |  2    3   curtis,barbara       3   toad |
7. |  3    1     bradshaw,joe       5   newt |
8. |  3    2     brown,arnold       5   newt |
9. |  3    3   dunaway,lowell       5   newt |
10. |  4    1    mullins,bobby       7   wolf |
|-----------------------------------------|
11. |  4    2   schneider,mark       7   wolf |
12. |  4    3    sump,lawrence       7   wolf |
+-----------------------------------------+

and -reshape- back

. reshape wide tvp , i(id) j(_j)
(note: j = 1 2 3)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       12   ->       4
Number of variables                   5   ->       6
j variable (3 values)                _j   ->   (dropped)
xij variables:
tvp   ->   tvp1 tvp2 tvp3
-----------------------------------------------------------------------------

and we are where you want:

. l

+----------------------------------------------------------------------+
| id             tvp1             tvp2             tvp3   prime    pet |
|----------------------------------------------------------------------|
1. |  1   dessent,harold    massagli,mark          wood,j.       2   frog |
2. |  2    beletz,elaine     carter,annie   curtis,barbara       3   toad |
3. |  3     bradshaw,joe     brown,arnold   dunaway,lowell       5   newt |
4. |  4    mullins,bobby   schneider,mark    sump,lawrence       7   wolf |
+----------------------------------------------------------------------+

In short,

gen id = _n
reshape long tvp , i(id)
bysort id (tvp) : replace _j = _n
reshape wide tvp , i(id) j(_j)

http://www.stata.com/support/faqs/data/reshape3.html

Nick
[email protected]

Scott Merryman

> Shouldn't "dessent,harold" come before "massagli,mark" ?
>
> Here is one way that uses Mata (and Ben Jann's -moremata-)
>
> clear
>
> input str20 x1 str20 x2 str20 x3
> "massagli,mark" "wood,j." "dessent,harold"
> "beletz,elaine" "carter,annie" "curtis,barbara"
> "schneider,mark" "mullins,bobby" "sump,lawrence"
> end
>
> tempfile foo
> mata
> C= J(3,1,"")
> A = st_sdata(.,.)'
> for (i = 1; i <=cols(A); i++) {
>     	A = sort(A,i)
>     	C = C,A[.,i]
> }
> C=C[.,(2::cols(A)+1)]'
> mm_outsheet("`foo'",C, mode="r")
> end
> set trace off
> insheet using `foo', clear tab
> l

Caleb Southworth

> > I would like to sort data within a set of variables (within
> a row). The
> > set of variables describe officer names, surname first. For
> example, some
> > of the data are
> >
> >       |              tvp1                     tvp2
>         tvp3 |
> >
> |------------------------------------------------------------------|
> >  466. |     massagli,mark                  wood,j.
> dessent,harold |
> >  476. |     beletz,elaine             carter,annie
> curtis,barbara |
> >  484. |      bradshaw,joe             brown,arnold
> dunaway,lowell |
> >  497. |    schneider,mark            mullins,bobby
> sump,lawrence |
> >
> >
> > The desired outcome is
> >
> >       |              tvp1                     tvp2
>         tvp3 |
> >
> |------------------------------------------------------------------|
> >  466. |     massagli,mark           dessent,harold
>      wood,j. |
> >  476. |     beletz,elaine             carter,annie
> curtis,barbara |
> >  484. |      bradshaw,joe             brown,arnold
> dunaway,lowell |
> >  497. |     mullins,bobby           schneider,mark
> sump,lawrence |
> >
> > such that the id number remains the same and the data are
> shuffled within
> > the tvp# variables. This will permit an assessment of
> change in officers
> > between years without looping over all the variables of a
> similar type
> > (i.e. vice presidents or presidents).
> >
> > I considered using reshape, but at least in my hands that